semantic web 0 (0) 1 ios press similarity-based knowledge … · semantic web 0 (0) 1 1 ios press...

31
Semantic Web 0 (0) 1 1 IOS Press Similarity-based Knowledge Graph Queries for Recommendation Retrieval Lisa Wenige * , Johannes Ruhland Chair of Business Information Systems, Friedrich-Schiller-Universität Jena, Germany E-mail: [email protected] Abstract. Current retrieval and recommendation approaches rely on hard-wired data models. This hinders personalized cus- tomizations to meet information needs of users in a more flexible manner. Therefore, the paper investigates how similarity-based retrieval strategies can be combined with graph queries to enable users or system providers to explore repositories in the Linked Open Data (LOD) cloud more thoroughly. For this purpose, we developed novel content-based recommendation approaches. They rely on concept annotations of Simple Knowledge Organization System (SKOS) vocabularies and a SPARQL-based query language that facilitates advanced and personalized requests for openly available knowledge graphs. We have comprehensively evaluated the novel search strategies in several test cases and example application domains (i.e., travel search and multimedia retrieval). The results of the web-based online experiments showed that our approaches increase the recall and diversity of rec- ommendations or at least provide a competitive alternative strategy of resource access when conventional methods do not provide helpful suggestions. The findings may be of use for Linked Data-enabled recommender systems (LDRS) as well as for semantic search engines that can consume LOD resources. Keywords: Recommender Systems, Linked Open Data, Information Retrieval, SPARQL, Semantic Search, SKOS 1. Introduction A recommender system (RS) component is usually one of the key search features in online portals. It helps users to discover items that reflect their interests [3]. Personalized recommendations are based on a user profile that contains either implicit (e.g., access statis- tics or click behavior) or explicit preference informa- tion (e.g., ratings for items). Common RS techniques are collaborative filtering (CF) or content-based (CB) algorithms. CF approaches derive suggestions from users with similar tastes, whereas CB methods are based on similar items according to metadata descrip- tions [26]. Descriptions in CB engines are often struc- tured in the table-like format of attribute-value pairs. The flat data structure tremendously reduces the mul- tidimensionality of preferences as well as item charac- teristics and can therefore produce weak recommenda- tions. This is why the complex data structure of RDF graphs * Corresponding author. E-mail: [email protected]. in the Linked Open Data (LOD) cloud can help to im- prove the representation of user tastes in CB engines. Consider the following example for illustration: Sup- pose a user has stated that s/he likes a particular movie director and would like to receive suggestions for other interesting filmmakers. A conventional RS would de- termine similar directors from these metadata. How- ever, the director descriptions may be predominantly comprised of irrelevant information, e.g., the nation- ality or the won prizes of the filmmakers. Such meta- data would not yield useful results in a purely content- based system and only achieve a few random hits. On the other hand, a request that is issued against the LOD collection DBpedia could explore the semantic net- work that surrounds the director. By this means, all movies that were shot by the favored director are iden- tified. Additional interesting movies, that were not di- rected by the filmmaker, but share certain characteris- tics with his movies (e.g., the same genres or main ac- tors) could enhance the metadata descriptions and user profile information further. The data web can answer such requests because the LOD technology stack pro- 1570-0844/0-1900/$35.00 c 0 – IOS Press and the authors. All rights reserved

Upload: others

Post on 17-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge … · Semantic Web 0 (0) 1 1 IOS Press Similarity-based Knowledge Graph Queries for Recommendation Retrieval Lisa Wenige*,

Semantic Web 0 (0) 1 1IOS Press

Similarity-based Knowledge Graph Queriesfor Recommendation RetrievalLisa Wenige Johannes RuhlandChair of Business Information Systems Friedrich-Schiller-Universitaumlt Jena GermanyE-mail lisawenigeuni-jenacom

Abstract Current retrieval and recommendation approaches rely on hard-wired data models This hinders personalized cus-tomizations to meet information needs of users in a more flexible manner Therefore the paper investigates how similarity-basedretrieval strategies can be combined with graph queries to enable users or system providers to explore repositories in the LinkedOpen Data (LOD) cloud more thoroughly For this purpose we developed novel content-based recommendation approachesThey rely on concept annotations of Simple Knowledge Organization System (SKOS) vocabularies and a SPARQL-based querylanguage that facilitates advanced and personalized requests for openly available knowledge graphs We have comprehensivelyevaluated the novel search strategies in several test cases and example application domains (ie travel search and multimediaretrieval) The results of the web-based online experiments showed that our approaches increase the recall and diversity of rec-ommendations or at least provide a competitive alternative strategy of resource access when conventional methods do not providehelpful suggestions The findings may be of use for Linked Data-enabled recommender systems (LDRS) as well as for semanticsearch engines that can consume LOD resources

Keywords Recommender Systems Linked Open Data Information Retrieval SPARQL Semantic Search SKOS

1 Introduction

A recommender system (RS) component is usuallyone of the key search features in online portals It helpsusers to discover items that reflect their interests [3]Personalized recommendations are based on a userprofile that contains either implicit (eg access statis-tics or click behavior) or explicit preference informa-tion (eg ratings for items) Common RS techniquesare collaborative filtering (CF) or content-based (CB)algorithms CF approaches derive suggestions fromusers with similar tastes whereas CB methods arebased on similar items according to metadata descrip-tions [26] Descriptions in CB engines are often struc-tured in the table-like format of attribute-value pairsThe flat data structure tremendously reduces the mul-tidimensionality of preferences as well as item charac-teristics and can therefore produce weak recommenda-tionsThis is why the complex data structure of RDF graphs

Corresponding author E-mail lisawenigeuni-jenacom

in the Linked Open Data (LOD) cloud can help to im-prove the representation of user tastes in CB enginesConsider the following example for illustration Sup-pose a user has stated that she likes a particular moviedirector and would like to receive suggestions for otherinteresting filmmakers A conventional RS would de-termine similar directors from these metadata How-ever the director descriptions may be predominantlycomprised of irrelevant information eg the nation-ality or the won prizes of the filmmakers Such meta-data would not yield useful results in a purely content-based system and only achieve a few random hits Onthe other hand a request that is issued against the LODcollection DBpedia could explore the semantic net-work that surrounds the director By this means allmovies that were shot by the favored director are iden-tified Additional interesting movies that were not di-rected by the filmmaker but share certain characteris-tics with his movies (eg the same genres or main ac-tors) could enhance the metadata descriptions and userprofile information further The data web can answersuch requests because the LOD technology stack pro-

1570-08440-1900$3500 ccopy 0 ndash IOS Press and the authors All rights reserved

2 L Wenige et al Similarity-based Graph Queries

vides the SPARQL Protocol and RDF Query Language(SPARQL) and suitable query engines that enable fastretrieval [55]However graph-based queries alone are not yet suffi-cient to generate personalized suggestions since theyonly return results for graph patterns that exist inthe repository Consider the following example Con-sumers may like to use recommendation engines thatwork on multiple domains simultaneously For in-stance a user who has stated that he likes certainitems from one domain may like to receive suggestionsfrom another domain as well (ie for cross-domain re-trieval) In case a user profile contains feedback infor-mation for books it would be desirable if the systemcould generate movie suggestions based on this dataWhile the approach of querying RDF graphs can beuseful for this recommendation scenario (since it canidentify matching objects based on entity type declara-tions or typed link information) it may also be the casethat there are no direct links from a preferred book toa suitable movie in the RDF graphTherefore RS designers should not rely too heavily ongraph-based queries but also explore possibilities ofsimilarity-based retrieval A possible solution wouldbe to identify books that are similar to the ones inthe user profile In a subsequent processing step andthrough a graph-based query the engine could explorewhether the similar books have any connections tomovie items in the repository This approach can re-veal implicit connections in the data and might help toreturn fewer empty result sets to the user Thereforethe data should be processed in an ad-hoc fashion Thisis because one processing step (identification of suit-able movies through graph pattern matching) relies onthe outcome of a preceding one (computation of sim-ilar book items) For this purpose it is important thatrecommendations can be quickly generated on-the-flywithout further preprocessing Thus graph matchingand similarity-based calculations can be merged Thisalso has to be accompanied by a query language whichfacilitates flexible combinations of the two retrievalparadigms By these means users can specify at eachkey step of a recommendation workflow (ie user pro-file generation and item selection) if either graph- orsimilarity-based processing operations should be trig-geredAlmost none of the existing Linked Data recommendersystems (LDRS) addresses these issues which pre-vents efficient usage of available knowledge sourcesfor retrieval tasks Instead current LDRS rely on afixed set of item features which have to be extracted

from the LOD cloud before suggestions can be gen-erated [12 13 22 24 29 50 65] However the ad-hoc retrieval and processing of item features for sim-ilarity computation in combination with graph-basedqueries can be facilitated by concepts from SimpleKnowledge Organization System (SKOS) vocabular-ies [38] SKOS annotations occur in more than halfof the repositories in the LOD cloud and often de-scribe real-world entities [53] For instance in the DB-pedia repository an LOD resource representing a mu-sic act is characterized by SKOS-based subject de-scriptors (eg stating the genre or the label of the mu-sic act) The descriptors are part of the DBpedia cat-egory graph [11] A recommendation framework thatutilizes SKOS annotations for item-to-item similaritycalculation is applicable to a wide range of data col-lections and domains Additionally SKOS subjects areuniquely identified URI resources Therefore they canbe conveniently processed for ad-hoc querying with-out further preprocessing [38]This paper presents presents novel retrieval strategiesand a SKOS-based recommendation engine calledSKOSRecommender (SKOSRec) that can flexiblycombine on-the-fly similarity calculation and SPARQL-like techniques to leverage the full potential of LODrepositories Their effectiveness have been proven in aseries of user studies In summary the paper addressesthe following key points

ndash Comprehensive literature review of the state-of-the-art in LOD-enabled Information Retrieval(IR) and recommender systems (Sect 2)

ndash Development of search strategies that are applica-ble to numerous LOD collections The novel re-trieval methods are facilitated by

bull Syntax and processing units for a SPARQL-based recommendation query language (Sect3)

bull Flexible combinations of similar resource re-trieval and graph pattern matching at differentstages of the recommendation workflow (egby being able to apply graph-based filter condi-tions on recommendation results or by utilizingrecommendations as part of a SPARQL-basedsubquery) (Subsects 32-35)

ndash Extensive evaluation of the novel retrieval ap-proaches in a series of web-based user experi-ments which have proven the effectiveness of thedeveloped methods especially in terms of diver-sity and recall (Sect 4)

L Wenige et al Similarity-based Graph Queries 3

2 Related Work

Effective answering of retrieval requests has beeninvestigated by researchers for many years Even be-fore the evolution of the LOD cloud scientists de-veloped strategies for extracting relevant informationfrom text data The most prevalent IR systems performrelevance rankings for unstructured resources and of-fer means to state filter conditions that complementqueries in a faceted search interface The approachescan combine query constraints with similarity-basedretrieval In these systems user requests are quicklyanswered due to extensive preprocessing and index-ing of resources before runtime execution [58] How-ever low latency of search results comes at the cost ofa hard-wired data model which prevents flexible cus-tomizations and causes limitations in terms of ad-hocretrieval [18]Other researchers have developed query-based RS fortable data For instance Adomavicius et al [2] proposethe REQUEST query language that generates sugges-tions from a relational database which contains infor-mation on both user profiles and items Thus personal-ized recommendation queries can be formulated An-other interesting approach in the category of query-based RS is the FlexRecs system by Koutrika et al[32] The key advantage of their course RS is that itenables highly individual requests in which differentquery parts (ie profile generation filtering of rec-ommendations or like-minded peers) can be flexiblycombined into so called recommendation workflowsHence highly individual recommendation queries canbe formulated with FlexRecsDespite the strengths and weaknesses of the discussedapproaches it has to be taken into account that theyare not designed to consume LOD LOD-enabled sys-tems on the other hand use RDF data for retrievalThere exist many index-based Linked Data IR systems[4 9 10 20 25 40 46 47 61] However they indexRDF resources before runtime retrieval and can there-fore neither facilitate on-the-fly similarity calculationnor do they provide extensive SPARQL-based queryoptionsAnother interesting engine category are on-the-flyLinked Data IR systems These systems process in-formation from the LOD cloud through spreading ac-tivation algorithms While they enable fast retrievalthe downside of these approaches is that graph-basedquery constraints are only possible to a limited ex-tent due to the applied probabilistic path traversal tech-niques [16 17 28 33 35 56]

Existing LDRS on the other hand mostly performoffline computation of item-to-item similarities fromLOD metadata descriptions [13 22 24 29 50 65]While this approach offers more control over the re-trieval process it prevents individual customizationsand runtime responsesTo this date there are only a few query-based LDRS[5 43 49] Most of these systems rely on the assump-tion that user preferences and item metadata reside inthe same repository But this goes against the notion ofthe LOD cloud as a decentralized and openly accessi-ble data space which can be queried through API-likeinterfaces Additionally while query-based LDRS pro-vide SPARQL-based query options to filter recommen-dation results they are not capable to flexibly combinesimilar resource retrieval with graph filters (eg sub-querying with recommendation results)In summary existing non-Linked Data as well asLinked Data-enabled retrieval and RS engines alreadyprovide useful features for personalized resource ac-cess Nevertheless the full potential of LOD for rec-ommendation and search tasks has yet to be realized(see Table 1) The novel query strategies of the SKOS-Rec engine are attempts to tackle the previously out-lined research gaps to improve existing RS as well asto offer new search strategies for LOD repositories thatare guided by personal preferences

3 The SKOS Recommender

31 System Overview

The SKOSRecommender is designed to consumerecommendation requests and was developed with thehelp of the Apache Jena framework1 A SKOSRecquery triggers a recommendation workflow in whichsimilarity-based retrieval steps can be joined withSPARQL-like filter expressions Listing 1 shows thesyntax of the SKOSRec query language that needs tobe applied to formulate recommendation requests

In the given grammar underlined parts representSPARQL syntax elements that have been directly takenfrom the latest W3C specification [21] Words in cap-ital letters denote query keywords which are alsoSPARQL expressions except for the ones listed in thelanguage parts SimProjection and ServiceIntegrationAs a regular SPARQL SELECT query a SKOSRecquery always generates a table-like result set that con-

1httpsjenaapacheorg

4 L Wenige et al Similarity-based Graph Queries

Table 1Summary of existing retrieval and recommendation approaches

System Type LOD-enabled A SPARQL-based queries B On-the-fly similarity C Flexible combination ofA and B

IR systems x x x x

Query-based RS x x X (X)

Index-based LinkedData IR systems

X x x x

On-the-fly LinkedData IR systems

X x X x

LDRS X x x x

Query-based LDRS X X x x

Listing 1 Grammar of the SKOSRec query languageSKOSRecQuery = Prologue S e l e c t P a r t S i m P r o j e c t i o n PREF I t e m P a r t + RecWhereClause S e l e c t P a r t = SELECT ( DISTINCT | REDUCED) Var+ RecWhereClause

LimitClause A g g r e g a t i o n A g g r e g a t i o n = AGG IRIref Var (SUM | MAX | AVG )S i m P r o j e c t i o n = RECOMMEND Var TOP INTEGER S e r v i c e I n t e g r a t i o n S e r v i c e I n t e g r a t i o n = FROM SERVICE IRIREFI t e m P a r t = ( V a r P a r t | IRI ) SimV a r P a r t = [ Var RecWhereClause ]Sim = SIM R e l a t i o n DECIMALR e l a t i o n = ( gt | gt= | = )RecWhereClause = WHERE RecGroupGraphPa t t e rnRecGroupGraphPa t t e rn = rsquo rsquo RecGroupGraphPa t t e rnSub rsquo rsquoRecGroupGraphPa t t e rnSub = TriplesBlock ( R e c G r a p h P a t t e r n N o t T r i p l e s rsquo rsquo TriplesBlock)lowastR e c G r a p h P a t t e r n N o t T r i p l e s = RecGroupOrUnionGraphPat te rn | R e c O p t i o n a l G r a p h P a t t e r n |

RecMinusGraphPa t t e rn | Filter | Bind | InlineDataRecGroupOrUnionGraphPa t te rn = RecGroupGraphPa t t e rn ( rsquoUNIONrsquo RecGroupGraphPa t t e rn )lowastR e c O p t i o n a l G r a p h P a t t e r n = rsquoOPTIONALrsquo RecGroupGraphPa t t e rnRecMinusGraphPa t t e rn = rsquoMINUSrsquo RecGroupGraphPa t t e rn

tains at least a mapping to a single variable A detailedexplanation of each part of the SKOSRec grammar canbe found in the Appendix A of this paperFig 1 depicts the overall architecture of the systemwith all its components The most important parts arethe SKOSRec query processing units (ie the parserand the compiler) They check the syntax of recom-mendation requests and regulate the correct execu-tion of critical operations of the engine (eg graph-based filtering result set joins similarity calculation)The architecture does not dictate a particular SPARQLserver implementation for LOD repositories since thesystem is decoupled from the endpoint that is accessedover HTTP during retrieval However the prototypi-cal implementation of the SKOSRec engine utilizes theOpenLink Virtuoso server2

Fig 1 also shows the sequential order of a typicalrecommendation workflow for a complex request thatconsists of the following retrieval steps

2httpsvirtuosoopenlinkswcom

1 Issuing a SKOSRec query2 Parsing the SKOSRec query3 Compiling the SKOSRec query4 Loading information on the LOD repositoryS-

PARQL endpoint to be queried from the config-uration

5 Retrieval of preferred items and setting up of theuser profile (see Subsect 33 Preference Query-ing)

6 Retrieval and processing of relevant LOD re-sources upon application of prefilter conditions(see Subsect 34 Prefiltering)

7 Optimizing the retrieval process through identi-fying the resources that can be omitted for thefinal ranking3

8 Determination of similar LOD resources basedon the user profile (see Subsect 32 On-the-FlyRecommendations

3 For further information on optimized retrieval please refer to[62]

L Wenige et al Similarity-based Graph Queries 5

Fig 1 Architecture and workflow of the SKOSRec engine

9 Ranking of LOD resources10 Sending of recommendation results to the com-

piler and joining them with postfilter constraints11 Retrieval of the final result set (see Subsect 35

Postfiltering amp Combinations)12 Output to the user

32 On-the-Fly Recommendations

A simple on-the-fly request is the most basic form ofa SKOSRec query It is based on the enginersquos ability togenerate ad-hoc recommendations from a user profilecontaining preference statements for LOD resourcesListing 2 depicts an example SKOSRec on-the-fly re-quest that obtains suggestions based on a userrsquos pref-erence for the western movie ldquoThey Call Me Trin-ityrdquo In the given query the preferred movie is rep-resented by the DBpedia resource (dbrThey_Call_Me_Trinity) The shown query also contains a prefixwith a namespace declaration and a specification re-garding the number of suggestions the engine shouldgenerate

Listing 2 A simple recommendation query for themovie domain (Q1)

PREFIX dbr lthttpdbpediaorgresourcegt

RECOMMEND movie TOP 5PREF dbrThey_Call_Me_Trinity

In this basic form the query does neither con-tain any pre- or postfilter conditions nor a preferencequery statement Thus the engine simply identifies allmovies that are similar to the preference in the profileWhen executed over DBpedia the query returns thesolution set that is depicted in Table 2

Table 2Result set for Q1

moviedbrGod_Forgivesrsquo_I_DonrsquotdbrAce_High_(1968_film)dbrBoot_Hill_(film)dbrTrinity_Is_Still_My_NamedbrTroublemakers_(1994_film)

For determining recommendations in an ad-hocfashion the system identifies SKOS annotations bymatching a suitable property (ie annotation property)in an RDF dataset (Definition 1)

6 L Wenige et al Similarity-based Graph Queries

Definition 1 (Annotation property) An annotationproperty is an IRI (Listing 3) that is defined in theDublin Core Metadata Initiative (DCMI) specificationfor subject annotations It can occur in conjunctionwith one of the two namespaces which are stated bythe standard [15]

Listing 3 Annotation property

ltANNOTPROPgt = dcsubject | dctsubject

These properties help to retrieve SKOS annotationtriples for items preferred by a user (Definition 2)

Definition 2 (SKOS annotation triple) A SKOS anno-tation triple (st) is a triple that is comprised of an IRIin the subject position an annotation property (AN-NOTPROP) as predicate and a concept c in the objectposition The concept is part of a knowledge organiza-tion system S in SKOS format (C = c | c isin S ) (Eq1)

st isin I times ltANNOTPROPgttimesC (1)

The identification of SKOS annotation triples facil-itates a fast retrieval of potential similar items There-fore the engine evaluates shared features between re-sources from the SKOS annotation dataset (Definition3)

Definition 3 (SKOS annotation dataset) A SKOS an-notation dataset (AD) is a subset of an RDF datasetthat contains SKOS annotation triples (Eq 2)

AD sub I times ltANNOTPROPgttimes C (2)

Before the similarity calculation process starts theengine determines SKOS annotations for each LODresource (r) from the user profile by issuing a sub-sequent SPARQL query to the LOD repository fromwhich the user intends to receive suggestions The no-tion of SKOS annotations is specified in Definition 4

Definition 4 (SKOS annotations) In the annotationdataset LOD resources directly link to concepts ofa SKOS system via a predefined annotation propertySKOS annotations for an input resource r are definedas in Equation 3

Annot(r) = c isin S | exist(rltANNOTPROPgt c) isin AD (3)

Afterwards the engine identifies which of the otherresources in the dataset shares annotations with itSimilarities only have to be computed for items withmatching concepts [13 52] Therefore triples canbe efficiently joined along item features since nativeRDF storage systems usually index triples rather thancolumns [41] Thus the quick identification of mu-tual annotations through SPARQL requests facilitatesad-hoc similarity calculation Definition 5 specifies thenotion of relevant resources and their annotations thatare needed to identify similar items The applied syn-tax follows Peacuterez et al [42]

Definition 5 (Relevant resources and their annota-tions) The conjunctive graph pattern Pr matches allitems and subjects that are potentially relevant for rec-ommendation retrieval (Eq 4)

Pr = (rltANNOTPROPgt c) AND (qltANNOTPROPgt c)

(4)

The mapping Ωr of relevant resources and theirSKOS annotations is obtained by retrieving all re-sources q that share at least one SKOS concept c withresource r from AD (Eq 5) The corresponding resultset contains mappings for all variables (ie c and q)in the triple statements of Pr

Ωr = [[Pr]]AD (5)

Upon extraction of relevant resources and annota-tions the process of similarity calculation can startThe engine determines the information content (IC) ofthe shared SKOS annotations of two resources Thisapproach is based on the hybrid similarity metric pro-posed by Meymandpour and Davis [37] for LOD-enabled RS However while their measure considersall types of IRI resources for the retrieval our similar-ity metric purely relies on SKOS annotations (Defini-tion 6)

Definition 6 (SKOS similarity) Let Annot(r) be theset of SKOS features of resource r and Annot(q) the setof SKOS features of resource q and q isin micro(q) | micro isinΩr then their similarity can be derived from the IC oftheir shared concepts Cshared = Annot(r) cap Annot(q)

sim(r q) = IC(Cshared) (6)

L Wenige et al Similarity-based Graph Queries 7

The IC of a set of SKOS concepts is measured bythe sum of the inverse logarithms of each conceptrsquos fre-quency in relation to the maximum frequency of oc-currences among relevant resources (Definition 7)

Definition 7 (SKOS Information Content) The SKOSIC is defined as an aggregation of the individual ICvalues of each concept that is contained in the setof shared features (c isin Cshared) where f req(c) isthe frequency of occurrences of c among all rele-vant resources and n is the maximum frequency of oc-currences among these resources The final similarityscore of two resources r and q is determined by sum-mating the IC values of each concept of the shared fea-ture set

IC(Cshared) = minussum

cisinCshared

log(

f req(c)

n

)(7)

After having obtained resource annotations as wellas IC values similarity scores between the input re-source r and each potentially relevant resource q canbe calculated When a user profile contains more thana single item similarity values are aggregated throughsummation to determine the final recommendationscore (Definition 8)

Definition 8 (Recommendation score) The recom-mendation score of a potentially relevant resource q isquantified by the sum of similarities with each resourcer that can be found in the profile (Pr)

score(Pr q) =sumrisinPr

sim(r q) (8)

Upon calculation of similarity values for each po-tentially relevant item q the engine ranks the retrievedresources based on a given limit (k) that is specified bythe user in the SKOSRec queryWith the presented methods of on-the-fly retrieval rec-ommendations can be generated from LOD reposito-ries through an API interface without requiring lo-cal metadata or excessive preprocessing operationsother than the mapping of profile items with LOD re-sources4

4Di Noia et al describe how a mapping procedure can beconducted for a real-world implementation For their LDRS theymatched movie titles with the corresponding LOD resources in DB-pedia The authors extracted IRIs labels and publication dates forall movie resources They applied the Levenshtein distance on theseresources to identify matching movies items [13]

33 Preference Querying

In the previous subsection it was assumed that usersexpress their tastes in the form of preference state-ments for a certain item in the form of an IRI refer-ring to a LOD resource However it may also be thecase that likings are only vaguely known for instancewhen users prefer products with specific features butare not able or willing to specify the actual itemsThe engine obtains such a profile by issuing a sub-sequent SPARQL query that contains the preferencegraph pattern (Ppre f ) in the WHERE condition Sup-pose a user is interested in Quentin Tarantino movies(dbrQuentin_Tarantino) but does not specify theprecise items she likes He could send a preferencequery to the SKOSRec engine The graph pattern inthe request would match all movies that are con-nected to dbrQuentin_Tarantino with the prop-erties dbodirector or dboproducer (see Listing7)

Listing 4 SKOSRec query to generate recommenda-tions based on a preference query in the movie domain(Q2)

PREFIX dbr lthttpdbpediaorgresourcegt PREFIX dbo lthttpdbpediaorgontologygt

RECOMMEND movie TOP 5PREF [ prefMovieWHERE prefMovie dbodirector dbrQuentin_Tarantino UNIONprefMovie dboproducer dbrQuentin_Tarantino ]

Table 35 shows the corresponding solution set tothe preference query containing films from DBpediathat are similar to Quentin Tarantino movies

Table 3Result set for Q2

moviedbrThe_Godfather_Part_IIdbrThe_Prestige_(film)dbrThe_Dark_KnightdbrPlanet_TerrordbrClerks

The notion of a queried profile is specified in Defi-nition 9

Definition 9 (Queried profile) A queried profile is auser profile that has been compiled by retrieving allLOD resources r that are part of an annotation graph

8 L Wenige et al Similarity-based Graph Queries

AD and are contained in the solution mappings result-ing from the evaluation of a specified graph pattern(Ppre f ) over a dataset D (Eq 9)

Pr = micro(r) | micro isin [[Ppre f ]]D r isin dom(micro) r isin AD

(9)

34 Prefiltering

To facilitate exploratory search in LOD reposito-ries users should also have the option to formulateconditions in their queries A possible way to enablefilters during retrieval is before similarity calculationThus recommendation lists can be adjusted to individ-ual needsDue to the expressiveness of RDF data filter condi-tions can not only be applied on attributes that are di-rectly connected to potentially relevant items but canalso be declared as a graph pattern By this means it ispossible to retrieve LOD resources that are both simi-lar to the user profile as well as fulfill an advanced con-ditionThe strengths of the approach will be illustrated with atravel recommendation request that was executed overDBpedia Suppose a customer expressed a preferencefor the Lake Baikal region (eg as indicated by a pos-itive rating on a travel community site) With simpleon-the-fly retrieval she would receive a recommen-dation list that primarily consists of travel destinationsthat are located in the proximity to this geographicalarea A graph-based filter on the other hand facili-tates advanced retrieval options In the given exampleit enables the user to obtain suggestions for SoutheastAsian travel destinations that have similar features asthe Lake Baikal region even though the resource dbrLake_Baikal is not directly connected to any desti-nation with the annotation dbcSoutheast_Asia butcan only be identified by exploring the surroundinggraph structure of the filter attributeThe advanced constraint ensures that the generatedresult set is not empty A simple attribute-level fil-ter which is commonly utilized in faceted search sys-tems would not generate any search hits By thesemeans graph-based filter patterns can help users tobetter explore knowledge graphs thereby overcomingdata quality issues in LOD repositories The corre-sponding SKOSRec query to this example is shown inListing 5

Listing 5 SKOSRec query with an expressive pre-filter condition to retrieve Southeast Asian destinationsbased on a preference for the Lake Baikal region (Q3)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbc lthttpdbpediaorgresourceCategorygtPREFIX dbo lthttpdbpediaorgontologygtPREFIX dct lthttppurlorgdctermsgtPREFIX skos lthttpwwww3org200402skoscoregt

RECOMMEND destination TOP 5PREF dbrLake_Baikal[ WHERE destination dbocountry country

country dctsubject countrySubjectcountrySubject skosbroader dbcSoutheast_Asiadestination rdftype dboPlace

]

Table 34 shows the solution to this query It con-tains travel destinations and regions of interest thatare both similar to the previously liked destinationdbrLake_Baikal and are located in Southeast AsiaEcoregions around lakes rivers or mountains fromSoutheast Asian countries such as Cambodia Vietnamor Malaysia are presented to the user

Table 4Result set of Q3

destinationdbrTonle_SapdbrMekongdbrMount_KinabaludbrLaguna_de_BaydbrLake_Toba

The technical details for this type of query are givenin the following definitions Before the extraction ofrelevant resources and annotations the recommenda-tion engine identifies the resources that satisfy the fil-ter (Definition 10)

Definition 10 (Prefiltered resources) The mapping offiltered resources Ωpre f ilter is obtained by matching aspecified graph pattern (P) to the RDF graph of adataset D (Eq 10)

Ωpre f ilter = micro|q | micro isin [[P]]D (10)

After preselection resources are brought togetherwith user profile data to retrieve the final set of recom-mendations (Definition 11)

Definition 11 (Annotations and relevant resources(prefiltered retrieval)) The solution mappings of an-notations and relevant resources in prefiltered retrievalmode is obtained by joining the set of prefiltered re-

L Wenige et al Similarity-based Graph Queries 9

sources with the mappings Ωr of all relevant resourcesand annotations as determined from the user profile

Ωpre = Ωpre f ilter Ωr (11)

After retrieving annotations and relevant resourcesin prefiltering mode the system generates constraint-based recommendations In this context the rankingprocedure is adapted to the restricted set of relevantresources and annotations Hence IC values of match-ing annotations and respective SKOS similarities arequantified according to the user filter By this meansrecommendation scores reflect the similarity of itemsfor the actual set of filtered LOD resources Except forthis restriction the system carries out the calculations(ie determination of similarity values the ranking ofLOD resources) in the same way as in non-filteringmode

35 Postfiltering amp Combinations

The SKOSRec engine cannot only combine simi-larity calculation and graph pattern matching to pre-filter relevant LOD resources it can also apply userconstraints after similar items have already been de-termined By this means recommendations are filteredex-post to the similarity detection process This fea-ture enables novel retrieval patterns in which sugges-tions are part of a subquery As an example scenariofor a postfilter recommendation task consider the fol-lowing SKOSRec query which obtains suggestions formovies and directors based on a userrsquos preference forthe director dbrQuentin_Tarantino (see Listing6) For the given request the engine would identifysimilar directors based on Quentin Tarantinorsquos char-acteristic features (eg geographic origin or awardswon) and would determine the movies these similar di-rectors have shot Thus users may receive interestingmovie recommendations

Table 5 shows the solution the SKOSRec system re-turns when Q4 is issued against DBpedia The ellipsesin the displayed tables indicate records that have beenexcluded for brevity reasons

However this approach can still be improved uponFor instance the engine only considers the metadatadescriptions of the director whereas related movieshave to be retrieved anew upon execution of the post-filter section Thus they are not ranked according tosimilarity in the final output tableAnother weakness is the biased semantics of the query

Listing 6 SKOSRec query to generate simple post-filtered recommendations in the movie domain (Q4)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX yago lthttpdbpediaorgclassyagogt

SELECT director movieWHERE movie dbodirector director director rdftype yagoDirector110014939 LIMIT 50RECOMMEND director TOP 5PREF dbrQuentin_Tarantino

Table 5Result set of Q4

director moviedbrKirk_Douglas dbrScalawag_(film)dbrKirk_Douglas dbrPosse_(1975_film)dbrKevin_Costner dbrThe_Postman_(film)dbrKevin_Costner dbrDances_with_Wolves dbrGene_Kelly dbrHello_Dolly_(film)dbrGene_Kelly dbrInvitation_to_the_Dance_(film) dbrAlan_Alda dbrGoodbye_Farewell_and_AmendbrAlan_Alda dbrMargaretrsquo_s_Engagement dbrSteve_Buscemi dbrInterview_(2007_film)dbrSteve_Buscemi dbrAnimal_Factory

Usually when a person favors a movie director thispreference refers to the movies this director has shotand not to the personal features of the person Two di-rectors might be similar according to certain featuressuch as birth year nationality or received awards butthe style and genre of the movies they have created canstill be different A third factor is that recommenda-tion retrieval for movie directors should also take intoaccount that a film artist is likely more relevant themore movies she has shot that are in line with the pre-ferred movies of the user Hence the SKOSRec engineenables another postfiltering strategy which is calledaggregation-based retrieval This strategy can be help-ful whenever an entity type is linked to a couple ofLOD resources in a dataset such that similarity scoresof related entities (eg movies) can be aggregated toan overall recommendation score for a certain item(eg for a movie director)Aggregation-based requests can be well combinedwith other retrieval patterns such as preference queries

10 L Wenige et al Similarity-based Graph Queries

In the case of the movie recommendation scenario auser could state that she enjoyed Quentin Tarantinomovies in the past and would like to receive sugges-tions for directors that shot similar film Usually thiskind of question can typically only be answered byanother human being eg in a personal interactionamong friends or in an online forum The fact that thelanguage facilitates these kinds of queries is one of itskey advantages

Listing 7 SKOSRec query to generate post-filteredrecommendations based on a preference query in themovie domain (Q5)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX yago lthttpdbpediaorgclassyagogt

SELECT director movieWHERE movie dbodirector director director rdftype yagoDirector110014939 LIMIT 50AGG dbrQuentin_Tarantino director SUMRECOMMEND movie TOP 100PREF [ prefMovieWHERE prefMovie dbodirector dbrQuentin_Tarantino ]

Table 6 depicts the results of Q5 as retrieved fromDBpedia When comparing the solutions sets for Q4and Q5 it becomes clear that they are totally dif-ferent While Q5 generated a list of directors whohave mostly created action and thriller movies with atwist (ie Tarantinorsquos unique filmmaking style5) Q4provided a completely different output It shows USAmerican actor-directors whose movies range frommusical films (dbrHello_Dolly_(film)) towestern movies (dbrDances_with_Wolves) ordrama films (dbrInterview_(2007_film))Thus the directors in the solution table of Q4 (Table5) seem rather arbitrary and not exceptionally helpfulconcerning the initial user request

We define this kind of aggregation-based request asa so-called rollup query This definition was chosen be-cause the final suggestions are based on summarized(ie rolled up) preferences for sublevel entities An-other example for a roll-up request stems from the ap-

5httpsenwikipediaorgwikiQuentin_Tarantino

Table 6Result set of Q5

director moviedbrSteven_Spielberg dbrIndiana_JonesdbrSteven_Spielberg dbrJurrasic_Park_(film) dbrFrancis_Ford_Coppola dbrThe_Godfather_Part_IIdbrFrancis_Ford_Coppola dbrThe_Godfather_Part_III dbrChristopher_Nolan dbrThe_Prestige_(film)dbrChristopher_Nolan dbrThe_Dark_Knight dbrRobert_Rodriguez dbrPlanet_TerrordbrRobert_Rodriguez dbrMachete_(film) dbrKevin_Smith dbrClerksdbrKevin_Smith dbrClerks_II

plication scenario of travel search This recommenda-tion query derives suggestions for city trip destinationsfrom the points of interest (POI) a user has visited andliked in another city (eg London) Listing 8 showsthe corresponding SKOSRec query for this scenario

Listing 8 SKOSRec query to generate post-filteredrecommendations (aggregation-based) for a preferenceprofile containing the city of London (Q6)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX dullthttpwwwontologydesignpatternsorggtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX rdfslthttpwwww3org200001rdf-schemagt

SELECT cityWHERE poi property city city rdftype dboCity property rdfssubPropertyOf dulhasLocation LIMIT 3AGG dbrLondon city SUMRECOMMEND poi TOP 100PREF dbrOxford_StreetdbrNotting_HilldbrSouth_Bank

The SKOSRec request contains subquery com-mands that refer to the userrsquos preferred POIs It spec-ifies how similarity scores of the generated sugges-tions should be aggregated (ie sum-based) The re-trieval is based on the 100 POIs that are most simi-lar to the ones in the preference section Additionallythe subquery part refers to the DBpedia resource thatrepresents London (AGG dbrLondon city) thusindicating that the list of recommendations should notcontain any POIs in this city Upon processing the de-

L Wenige et al Similarity-based Graph Queries 11

picted query over the DBpedia repository the enginedisplays the three cities for whom the highest aggre-gated scores have been detected (see Listing 7)

Table 7Result set of Q6

citydbrOxforddbrAbingdon-on-ThamesdbrWallingford_Oxford

The motivation for formulating such a query is thatwhen searching for cities preferences might be betterrepresented through the sights a user has visited else-where rather than through a simple on-the-fly requestsolely containing a preference statement for a city inthe user profile sectionBesides the above presented roll-up retrieval patternsthe SKOSRec grammar may facilitate the formula-tion of additional types of advanced recommendationqueries For instance cross-domain queries are an in-teresting additional pattern to be explored for personal-ized retrieval Listing 9 shows an example request thatgenerates movie suggestions based on the preferencefor a music band (ie dbrThe_Jackson_5)

Listing 9 SKOSRec query to generate cross-domainrecommendations (Q7)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgt

SELECT movie participated musicalActWHERE movie rdftype dboFilm VALUES dbobasedOn dbomusicComposerdbogenre dbostarring musicalAct participated movie UNION movie participated musicalAct LIMIT 10AGG dbrThe_Jackson_5 movie MAXRECOMMEND musicalAct TOP 100PREF dbrThe_Jackson_5WHERE VALUES actType dboMusicalArtist dboBand musicalAct rdftype actType

The query contains a prefilter condition that trig-gers retrieval of music artists and bands Afterwardsthe scores of the top 100 suggestions are summarizedwith maximum-based aggregation Thus the highestsimilarity value among the set of music acts deter-

Table 8Result set of Q7

movie musicalActdbrMoonwalker dbrMichael_Jackson dbrThe_Wiz_(film) dbrAshford_amp_SimpsondbrGarfield_Gets_a_Life dbrThe_TemptationsdbrPipe_Dreams_(1976_films) dbrGladys_Knight_amp_the_PipsdbrThe_Jacksons dbrJermanine_Jackson

mines the ranking score of the film Also note be-cause of the aggregation-based retrieval procedureconnections between movies and the LOD resourcedbrThe_Jackson_5 are excluded from similaritycalculation The scenario exemplifies how semanticrelationships can help to generate suggestions for a tar-get domain (ie movies) that are based on a profilecontaining items from a different source domain (iemusic acts) While regular SPARQL queries performexact matching of triple statements the SKOSRec syn-tax facilitates the integration of similarity- and graph-based retrieval to enrich result lists with other poten-tially relevant itemsAs a point of reference consider the SPARQL queryfor the given cross-domain scenario (Listing 10)

Listing 10 SPARQL query to generate cross-domainrecommendations (Q8)

PREFIX dbr lthttpdbpediaorgresourcegt PREFIX dbo lthttpdbpediaorgontologygt PREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgt

SELECT movie musicalActWHERE movie rdftype dboFilm VALUES dbobasedOn dbomusicComposerdbogenre dbostarring dbrThe_Jackson_5 participated movie UNION movie participated dbrThe_Jackson_5 LIMIT 5

The query omits the similarity calculation partThus it only retrieves movies linked to the resourcedbrThe_Jackson_5 By this means the SPARQLresults have a better chance of being related to the ini-tial user profile On the other hand empty result listscan occur when the preferred LOD resource does nothave any direct connections to recommendable itemsThis assumption is backed up by the result lists of thetwo queries (Tables 8 and 9) The SKOSRec query pro-duces a more comprehensive solution than a regularSPARQL query

12 L Wenige et al Similarity-based Graph Queries

Table 9Result set of Q8

movie musicalActdbrThe_Jacksons dbrThe_Jackson_5

The following of definitions will formalize the post-filtered and aggregation-based retrieval forms Defini-tion 12 specifies the notion of SKOSRec recommenda-tions that may potentially undergo postprocessing

Definition 12 (SKOSRec recommendations) SKOS-Rec recommendations (R(Pr)) are the set of sugges-tions generated by processing the solution mappings(Ω) obtained from recommendation retrieval Theycan be a result of simple on-the-fly retrieval prefilter-ing or a result of a combination of these techniques Prrepresents the input profile that contains at least onepreference for an item The cardinality of R(Pr) is theintended number of recommendations (k) that is spec-ified by the user based on which the engine selects theresources with the highest recommendation scores thatare not contained in the profile (Eq 12)

R(Pr) = x | x isin q | score(Pr q) gt kmaxqisinΩ

score(Pr q) q isin Pr

(12)

The approach of subquerying with recommendationresults builds on the idea that SKOSRec suggestionsR(Pr) can be joined with any graph pattern Ppost pastto the process of similarity calculation (Definition 13)It does not make a difference how the engine obtainedthese recommendations

Definition 13 (Postfiltered recommendation mapping)The postfiltered recommendation mapping is obtainedby joining solution mappings generated from SKOS-Rec recommendations with the results of matching apostfilter graph pattern (Ωpost) (Eqs 13 and 14)

Ωpost f ilter = micro | micro isin [[Ppost]]D x isin dom(micro)

(13)

Ωpost = micro|x | x isin R(Pr) Ωpost f ilter (14)

As the prefilter Ppost can be comprised of triplestatements that are eligible in the RecGroupGraph-

Pattern part of the SKOSRec grammar Aggregation-based queries have to be treated differently than a reg-ular postfilter request Apart from excluding sublevelitems of the profile from the set of similar resourcesthe engine should also omit any connections betweenthe preferred superordinate item (a) and similar sub-level entities Additionally the user has to specify thevariable in his profile (eg y) that represents the itemswithin the postfilter graph pattern Ppost By this meansaggregation-based recommendations can be generated(Definition 14)

Definition 14 (Resources for aggregation-based re-trieval) The set of resources for aggregation-based re-trieval is obtained by retrieving LOD resources thathave a connection to a previously generated sugges-tion thereby excluding all resources that are linked tothe specified superordinate item a in the profile (Eq15)

Ragg(a) = micro(y) | micro isin Ωpost y isin dom(micro) a isin micro(y) (15)

For each potential aggregation-based suggestion (y)similarity scores of connected entities (x) have to beconsidered for the final ranking In this context differ-ent approaches to aggregation can be applied In caseswhen both similar items and aggregation-based recom-mendations are entities on comparable levels of gran-ularity then choosing the maximum similarity scoremight be the best way to represent the relatedness ofthe resource to the user profile On the other handwhen postfiltered recommendations are based on sim-ilarity scores of their sublevel entities (eg as in themovie recommendation example) the scores of relatedsuggestions should be aggregated by the sum or theaverage of individual scores

Definition 15 (Aggregation-based ranking score) Theranking score of an LOD resource y that is connectedto a set of recommended items is determined by aggre-gating scores of connected entities either by the max-imum (Eq 16) the sum (Eq 17) or the average (Eq18) of individual scores

scoreaggMAX(y) = maxxisinmicro(xy)

score(Pr x) (16)

scoreaggS UM(y) =sum

xisinmicro(xy)

score(Pr x) (17)

L Wenige et al Similarity-based Graph Queries 13

scoreaggAVG(y) =

sumxisinmicro(xy) score(Pr x)

|x isin micro(x y)|(18)

4 Evaluation of the SKOS Recommender

41 Experimental Setup

The novel recommendation strategies of the SKOS-Rec engine were evaluated in the context of web-basedexperiments for the usage scenarios of travel destina-tion search and multimedia RS The online approachhelped to reach out to more participants with diversedemographic backgrounds which would not have beenpossible in a laboratory experiment Another positiveside-effect of the web-based setting was that partici-pants could test and rate recommendations without be-ing directly observed by an experimenter which mighthave caused biased results otherwise Hence the on-line approach softened common limitations of labora-tory studies [19]The experiments were carried out in the defined usagescenarios with metadata descriptions from DBpediaThe database containing the local mirror of DBpediaran on a virtual server Besides the triple store a webapplication was hosted on the same machine The web-site ran on an Apache Tomcat 7 application server andwas implemented with the help of Java servlet technol-ogy67

We conducted the online experiments between April2016 and January 2017 Upon setting up the web ap-plication a pretest was carried out to control for po-tential pitfalls such as misleading navigation Two testusers ran step by step through the web interface andprovided their feedback Because of their insights weslightly modified the interface (eg through rephras-ing user instructions) to increase the chances of suc-cessful completion In case the SKOSRec engine is ap-plied in a live setting the design of the user interfacewill have to be tested with more users Our experimen-tal series focused on algorithmic performance whichwas tested with numerous participantsSubjects were recruited through the clickworkercomplatform8 Participants received a compensation fee

6httptomcatapacheorg7httpwwworaclecomtechnetworkjavajavaeeservletindexhtml8httpswwwclickworkerde

of 100e or 150e for a completed survey 103 par-ticipants took part in the study on travel destinationsearch In the multimedia domains in total 154 sub-jects evaluated the SKOSRec engine For each partof the evaluation it was ensured that participants hadnot just run through the evaluation screen Wheneverthere appeared to be irregularities in the log files (egthe default settings of the screen were not changed)these assessments were excluded from further analy-sis The survey template started with a consent formIt informed participants about the context and the pro-cedure of the study After having filled in the consentform participants were navigated to the first part of theexperiment It collected data on subjectsrsquo demograph-ics and their consumption and search habits After thepreparation phase participants worked on the first testcase (TC1) in each usage scenario In this test case theperformance of the SKOSRec engine was assessed inthe simple execution mode of on-the-fly recommenda-tions (see Subsect 32) It was done to gather data onthe usefulness of suggestions resulting from a simplecontent-based approach (ie baseline method) withwhich more advanced recommendation strategies werecompared at a later stage In cases where users did notstate any items the application showed an error pageWhile setting up a profile required some effort on theside of the user it was a necessary step for the experi-ment Since we did not have access to the log files of alive application profile generation had to be simulatedThe process was facilitated by a specifically tailoredinterface (see Fig 2) where subjects could search foritems through an autocomplete field that applied AJAXtechnology9

It enabled users to type in keywords (eg the nameof a music act or the author of a book) for whichmatchings were instantaneously displayed withoutreloading the site Hence participants could quicklyfind and select their items of interest Whenever auser entered a query through the AJAX interfacethe system matched the query keywords against anApache SolrLucene search index10 The index wasbuilt with item metadata which had been extractedfrom DBpedia prior to setting up the website Theindex comprised the names and abstracts of itemsWhenever available information on thumbnail pic-tures were saved in the index to be ready for display inthe AJAX interface In case the Apache SolrLucene

9httpapijquerycomjqueryajax10httpluceneapacheorgsolr

14 L Wenige et al Similarity-based Graph Queries

Fig 2 User profile generation TC1 (music domain)

search engine found index keywords that matched theuser query metadata information for these items wereimmediately shown on the screen This informationwas displayed to help participants make an informeddecision on whether the AJAX result list containedsuitable items for profile generationAside from the AJAX feature the appeal of the in-terface was increased by a progress bar at the top ofthe webpage (see Fig 2) Progress bars are commonlyutilized tools for web surveys In conventional paper-based studies subjects usually see how far ahead theyare while in a web survey they cannot check theirprogress immediately This is especially true for stud-ies that apply a screen-by-screen navigation whichwas the chosen approach for the conducted web ex-periments Progress bars prevent users from tiring ofthe questionnaire and withdrawing from the study alto-gether by showing how close participants are to com-plete the experiment [14]Upon profile generation the SKOSRec engine calcu-lated on-the-fly recommendations for a simple query(similar to Q1 Sect 3) which were shown to usersin a separate screen Suggestions were displayed witha sufficient amount of information eg thumbnailslabels or a short description to help participants as-sess the quality of the recommendation In addition tothis information users could also access the respectiveWikipedia article by following the hyperlink on the la-bel Relevance sliders were used to assess the utility ofeach recommendation They are often applied in eval-uations of IR systems [51 54] Scores were handled aspoints on a 0 to 100 scale of the relevance slider (outerleft side lowest relevance outer right side highest rel-evance see Fig 3)

This approach facilitated the calculation of an accu-rate precision score and the application of tests withhigher statistical power than an ordinal 5-star scaleThe score called mean relevance (mrs) was measuredwith Equation 19

mrs =

ksumi=1

scorei

k(19)

In addition to the precision score the web applica-tion also determined the size of the recommendationlist produced by the engine as an indicator for recallHowever recommendation list size can only be uti-lized for this purpose when mrs scores reach a reason-able levelWhile accurate recommendations are useful becausethey build trust in a system [57] they only capture anarrow aspect of performance [36] For instance anonline retailer can tremendously profit from an RSthat provides both relevant and unobvious recommen-dations since it enables users to explore the prod-uct catalog more thoroughly On the other hand inthe case of an RS only suggesting popular productsit is likely that the user already knows them [3 23]A pure accuracy-based evaluation disregards these as-pects which is why other performance indicators suchas novelty were measured as well Thus the evalua-tion interface also contained a section where subjectshad to tick a radio button that indicated whether anitem was new In the backend of the web applicationnovelty was measured as the ratio of relevant items not

L Wenige et al Similarity-based Graph Queries 15

Fig 3 Recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

familiar to the user (|nov|) among the total numberof relevant items (|tp|) (Eq 20)

nv =|nov||tp|

(20)

An item was considered relevant when it achievedan mrs score of 50 or higher In addition to novelty anddiversity the web application also tracked the diversityof recommendation lists since a topically diversifiedresult set has been found to increase overall user satis-faction [31 66] In the RS literature the most widelyexplored diversification metrics are the ones that mea-

sure the similarity between each item pair in the rec-ommendation list The more the items are alike theless diverse is the result set [19] The study series ofthis paper applied the metric by Castells et al [8] Itdetermines item-to-item similarity values with the co-sine similarity measure Scores are then converted todistance values (Eq 21) and averaged for the recom-mendation list (Eq 22)

d(il im) = 1minus cos(il im) (21)

16 L Wenige et al Similarity-based Graph Queries

divC =2

k(k minus 1)

sumlltm

d(il im) (22)

The content-based diversity scores (divC) were cal-culated in the backend of the web application andsaved in a log file for each generated recommenda-tion list of the SKOSRec engine during runtime execu-tion In addition to the measurement of divC scores weasked participants about their opinions on recommen-dation list diversity directly Following the RS evalua-tion framework by Pu et al [44] two Likert items wereposed for this purpose

ndash The recommendations of list [] are diversendash The recommendations of list [] are similar to

each other

Besides the above listed algorithmic performancemetrics the authors gathered assessments from usersregarding the overall usefulness of the suggestionsThis approach is also in line with the empirically veri-fied evaluation framework by Pu et al [44] which pro-poses to measure the psychometric construct of Per-ceived Usefulness to get a comprehensive understand-ing of user opinions on recommendation results Thusparticipants were instructed to state their agreement topositive statements regarding the relevance of the sug-gestions The statements were adapated to the speci-ficities of each usage scenario Fig 4 shows an exam-ple collection of Likert questions for the domain ofmusic RS

After participants had provided the required infor-mation their answers were saved in a log file and theywere navigated to test case 2 (TC2) which evaluatedthe SKOSRec enginersquos performance for constraint-based queries (similar to Q3 Sect 3) We tried to findout if the engine was able to improve user satisfactionthrough filter conditions Even though IR systems al-ready successfully apply facet filters on result sets theexperiments still had to prove whether LOD-enabledconstraints have the same effect on the Perceived Use-fulness of recommendations For the comparisons ofnon-filtered and filtered suggestions we applied awithin-subjects design in which participants had torate two recommendation lists resulting from regular(non-filtered) and filtered requests in TC2 We favoredthis setup over a between-subjects design where sub-jects would have been assigned to only one result listWhile the between-subjects design more closely re-sembles a real-world setting since it does not require

participants to interact with more than one approachat a time differences among subjects (eg regard-ing demographics or topic expertise) can potentiallyhave a considerable impact on quality judgments [19]Hence when applying a between-subjects design onedoes not know whether differences in performance arecaused by the approaches or by the variability amongparticipants Therefore researchers need to conductbetween-subjects experiments with a sufficient numberof study subjects to level out these differences Within-subjects studies on the other hand require fewer par-ticipants while still achieving the same level of sta-tistical power since between-subject variability is notan issue [30 31] Beel et al have shown that evensmall variations in an RS test setting could cause sub-stantial differences in performance outcomes [6] Tokeep the extent of variations even lower comparisonsbetween non-filter and constraint-based recommenda-tions were based on the same profile for each user inTC2 Therefore the user profile from TC1 served as astarting point for filtered retrieval The web applicationshowed the recently generated profiles to users andasked them to provide an additional filter conditionFig 5 depicts an example web form for a constraint-based recommendation request in the music domain Itcomprises the user profile and an additional text fieldwhere users stated their constraint

Afterwards the engine applied this filter conditionon the set of potential recommendation results be-fore similarity calculation started For each usage sce-nario different types of constraints were available Re-garding suitable filter options the author sought toachieve a balance between assisting users in finding fil-ter conditions that would adequately represent their in-formation needs while keeping the variability amongusers as low as possible Therefore the specificitiesof each scenario and the availability of the respectivedata sources were determined For instance for themultimedia domains (movie music and books) it wasassumed that users would like to filter recommenda-tions according to the specific genre of the item Fortu-nately the DBpedia dataset contains this kind of infor-mation for the music domain In this domain the genrecan be retrieved through the properties dbogenreHowever in the movie and book domains DBpediadoes not provide sufficient information Many LOD re-sources which represent books were not assigned agenre In the movie domain a genre property is miss-ing entirely and genre-specific information can of-ten only be found in the subject categories describ-ing the movie Hence in contrast to the experiment on

L Wenige et al Similarity-based Graph Queries 17

Fig 4 Evaluation screen for recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

music recommendations participants in the domainsof books and films did not filter their recommenda-tion lists by genre On the other hand the propertydcsubject was applied as recommendation filter ineach multimedia domain Since the subject propertyoften links to many informative features of multimediaitems (eg release period geographic information orcontent information) the author decided that this fea-ture represents a powerful filter dimensionIn the experiment on travel destination search the webinterface offered two filters a subject and a location-specific filter The subject filter enabled users to statetheir preferences regarding the characteristics of a lo-cation (eg being a nature reserve) The location-specific filter on the other hand gave participants theoption to specify where the desired destination shouldbe located11

The second fundamental research issue of TC2 con-cerned the question whether expressive constraints canboost recommendation quality even further Hence

11httpdbpediaorg

apart from executing a constraint-based workflow witha simple filter (ie a direct attribute of an item)the SKOSRec engine generated suggestions with thehelp of an expressive filter as well This second pro-cedure while still applying the same user constraintas the simple filter utilized an expressive graph pat-tern to include additional LOD resources in the resultset For instance in case a subject filter was selectedresult set expansion was facilitated by exploring thewider semantic space of the DBpedia category graphthrough a property path declaration in the graph pat-tern (eg skosbroader[2]) In the travel exper-iment an additional expressive filter retrieved place-related information (dulhasLocation) for location-specific subproperty relations that are not material-ized in DBpedia Hence the expressive graph pat-tern expanded the filter such that additional proper-ties were matched (eg dbocountry dbocity ordbostate) while still applying the specified userconstraint on the set of LOD resources Upon exe-cution of constraint-based recommendation requestsusers received two recommendation lists (resultingfrom simple and expressive filtering accordingly) in a

18 L Wenige et al Similarity-based Graph Queries

Fig 5 Web form for a contraint-based recommendation request (music domain)

separate evaluation screen Participants evaluated re-sult sets in the same manner as in TC1 For each resultset the screen contained an additional Likert item ask-ing subjects to give their agreement to the statementldquoThe filter has improved the recommendation resultsof listrdquo The sequence of suggestions resulting fromthe different methods was randomly assigned in eachuser session to avoid order effectsAfter study subjects had completed TC2 they werenavigated to test case 3 (TC3) In the TC3 section ofthe travel experiment participants could choose be-tween three options for rollup query patterns (simi-lar to Q6 Sect 3) Users were able to obtain recom-mendations for travel destinations based on POIs theyhad visited during the stay in another destination Theycould either select a city (option 1) a region (option 2)or a country (option 3) as destination type Upon en-tity type selection the SKOSRec engine generated ap-propriate suggestions Fig 6 shows the web form fromthe travel experiment that facilitated the formulation ofadvanced rollup requests in TC3

When users had selected an entity type stated theirfavorite travel destination and three belonging POIs

the application generated a SKOSRec query from theseparameters A simple on-the-fly request was sent tothe engine as well to retrieve baseline recommenda-tions with which the suggestions resulting from theadvanced request were compared at a later stage ofthe experiment The simple query only contained theuserrsquos favorite travel destination and the selected en-tity type option thus omitting information on the POIsDuring the travel experiment the engine processedboth the simple and the advanced query and producedtwo recommendation lists As in the previous parts ofthe experiment participants assessed the quality of theresult sets which appeared in random order on thescreenThe multimedia experiments also contained a sec-tion on rollup retrieval (TC3) As the advanced travelqueries the multimedia requests aggregated similarityscores of sublevel entities through summation whichthe engine joined with a postfilter In addition to thesefeatures the pattern contained a preference query partThis section identified a set of preferred items througha single user statement Participants either specifiedtheir favorite directoractor (music domain) music act

L Wenige et al Similarity-based Graph Queries 19

Fig 6 Travel - User profile generation TC3

(music domain) or author (book domain) and receivedrecommendations based on the features of the workscreated by the artist (similar to Q5 Sect 3) TheSKOSRec engine generated recommendations basedon the creative works of the favored artist Partici-pants also received suggestions from a simple on-the-fly query which computed results according to theartistrsquos characteristics As in the travel experiment rec-ommendations from the advanced rollup request werecompared with the on-the-fly query without letting par-ticipants know which approach they were assessingand through random assignment of result set positionsAfter participants of the multimedia experiments hadevaluated the results of TC3 they were guided to testcase 4 (TC4) which evaluated cross-domain queriesThe experiment on travel RS ended with TC3 becauseno suitable graph patterns could be identified to fa-cilitate these kinds of suggestions for the travel usagescenario In the multimedia domain however the datafrom DBpedia was sufficient for this retrieval task Theentity type of the study (ie movie music act or book)defined the source domain based on which the enginegenerated suggestions for items from another multime-dia target domain (similar to Q7 Sect 3) Fig 7 de-picts the cross-domain web form of the music experi-ment

The web form assisted subjects in formulating a re-quest which then obtained movie suggestions basedon the favorite music act of the user Parameters fromusers were entered in the placeholders of the querypattern in the backend of the application We formu-lated the query templates in such a way that the queryengine matched any properties or graph patterns thatcould potentially connect two items from the specified

target and source domains For instance such match-ings can occur when a user states his favorite moviethat is written by a particular author In case the sameauthor has also written some books they might be ofinterest to the user as well Another example stemsfrom the movie domain Suppose in hisher profile aconsumer has declared a preference for a movie thatlinks to the corresponding soundtrack It might wellbe the case that the user likes the soundtrack and themusic acts that were involved in creating it just asmuch as she likes the movie The practice to distin-guish homonymous LOD resources through the prop-erty dbowikiPageDisambiguates is another in-teresting linking pattern from DBpedia that was ex-ploited for cross-domain requests in TC4 since it con-nects similar entities (ie the book edition of a certainmovie)Cross-domain SPARQL queries (similar to Q8 Sect3) were posed as baseline requests in this test caseWhile the SPARQL query only matched cross-domainrelations for a single LOD resource the SKOSRec re-quest expanded the search space by considering linksfrom more than one item As in the previous test casesTC4 ended with an evaluation screen where users as-sessed the SPARQL-based as well as the SKOSRecsuggestions in random order without knowing whichrecommendation list belonged to which method Table10 gives an overview of the test cases and the meth-ods that were applied in the experiments in each usagescenario

42 Results

In total 257 people took part in the studyThe gath-ered samples from the experiments contained answers

20 L Wenige et al Similarity-based Graph Queries

Fig 7 Music - User profile generation TC4 (page 9)

Table 10Test Cases in the web experiments

Test Case Evaluation Purpose Query Type Travel Movie Music BookTC1 On-the-fly Recommendations (Baseline) Q1 X X X X

TC2 Constraint-based Recommendations Q3 X X X X

TC3 Advanced Queries (Rollup) Q5 Q6 X X X X

TC4 Advanced Queries (Cross-Domain) Q7 x X X X

from both genders with a slight overrepresentation ofmale participants (537) The prevailing number ofstudy subjects was between 20 and 40 years of age(658) The evaluation of the demographics sectionalso revealed that the majority of participants alreadyperceives the search in their domain of interest as ei-ther ldquomanageablerdquo (385) or even as ldquoeasyrdquo or ldquoveryeasyrdquo (440)Prior to analyzing the performance results of the novelretrieval approaches of the SKOSRec engine it waschecked whether the respective Likert items for Per-ceived Usefulness and user-based diversity (divU)reached acceptable reliability levels For this purposeCronbachrsquos α values were determined Table 11 showsthe outcome of this analysis for the Perceived Useful-ness construct Given the acceptable values through-out the domains (Cronbachsrsquo s α gt 07) it could betreated as an interval-scaled variable in the statisticaltests

Table 11Cronbachrsquos α values for the concept of Perceived Usefulness

Domain Cronbach αTravel 07002

Movie 08579

Music 08515

Book 08949

Besides usefulness user-based diversity scores werealso measured by multiple Likert items However

Cronbachrsquos α values for this concept did not reach ac-ceptable levels In order to avoid that valuable user as-sessments remained unused agreements to the state-ment bdquoThe recommendations of list [] are diverse ldquowere taken into account as an ordinal variable (divU)Based on these results subsequent tests could be car-ried out The first part of the analysis concerned theperformance of the on-the-fly retrieval method (base-line) of the SKOSRec engine in TC1 The outcomeof this analysis confirms that the on-the-fly approachrepresents a viable recommendation strategy Table 12shows the mean (M)12 standard deviation (S D) andnumber of participants (N) for each performance di-mension and metric It can be seen that users assessedthe enginersquos performance in almost each quality di-mension to be above mediocrity Aside from the posi-tive user evaluations with regard to Perceived Uselful-ness related quality aspects such as Accuracy Noveltyand Diversity received good assessments as well Forinstance each domain mean relevance score (mrs) washigher than 50 score points It means that participantswere often convinced of the relevance of the resultsGiven these positive user assessments with regard toitem relevance it seems to be justified to consider themean of result list sizes as an indicator of recall Theengine was able to generate the required number ofrecommendations in almost each test case (see Table

12In case of the user-based diversity score (divU) M denotes themedian because of the ordinal scale of the variable

L Wenige et al Similarity-based Graph Queries 21

12) The on-the-fly retrieval approach was also testedwith regard to the Novelty dimension Table 12 showsthe mean novelty scores (nv) for the baseline recom-mendation approach Novelty scores indicate that themajority of items in a result list were deemed to be newin the travel and book RS whereas in the experimentson movie and music RS only one third of the recom-mendation list was unfamiliar to users In contrast tothat content-based Diversity scores (divC) were ratherhigh in each domain Hence it can be concluded thatitems in a result set usually have a low rate of match-ing SKOS annotations In addition to content-basedscores user assessments of recommendation list Di-versity (divU) were also taken into account The me-dian level of agreement to the Diversity-related Lik-ert item stating that recommendations are diverse liedat 3 (ldquoneutralrdquo) in the multimedia domains and at 4(ldquoagreerdquo) in the travel domainIn summary the baseline approach achieved fairlygood performance results in TC1 The findings demon-strate that LOD-enabled recommendations can po-tentially enhance a userrsquos search experience Theyalso allow for subsequent tests of advanced retrievaltechniques which can be compared to this standardmethod

In the analysis of performance results from TC2(constraint-based recommendations) particular atten-tion was paid to participantsrsquo agreements with thestatement ldquoThe filter has improved the recommenda-tion results of the listrdquo Fig 8 shows the distribution ofdomain-wise agreement ratios On average they werefairly high Throughout the domains the vast majorityof participants either selected 5 (ldquostrongly agreerdquo) 4(ldquoagreerdquo) or 3 (ldquoneutralrdquo) Hence a filter often had apositive impact on results

The authors gathered responses to the questionon improvement for recommendation lists resultingfrom both the simple and the expressive filtering ap-proaches Hence they give clues about the generalperformance of constraint-based retrieval Howeversince this section of the study revealed the experimen-tal condition to participants (ie application of a fil-ter) answers to the Likert statement have to be inter-preted cautiously Results may be slightly biased asparticipants were able to guess the underlying agendaHowever these limitations do not exist for compar-isons between simple and expressive filters Since theweb application randomly assigned the list order par-ticipants did not know which approach they were as-sessing This enables us to take a closer look at the dif-ferences between the two filtering approaches It was

Table 12Performance results in TC1 (M = Mean S D = Standard DeviationN = No of participants)

Dimension (Metric)[Max Score]

Domain Results

M SD N

Accuracy (size) [10]

Travel 1000 000 103Movie 1000 000 50Music 1000 000 53Book 980 140 51

Accuracy (mrs) [1]

Travel 5649 000 103Movie 6225 1915 50Music 5570 010 53Book 5747 1555 50

Novelty (nv) [1]

Travel 060 034 101Movie 036 031 50Music 036 039 52Book 061 029 47

Diversity (divU) [5]

Travel 4 - 102Movie 3 - 50Music 3 - 53Book 3 - 50

Diversity (divC) [1]

Travel 080 016 103Movie 087 012 50Music 086 010 50Book 092 008 50

Usefulness [5]

Travel 334 085 103Movie 355 073 50Music 318 085 53Book 343 081 50

hypothesized that advanced filters improve recommen-dation quality as they may identify more relevant re-sources thereby overcoming problems of data qualityor data sparsity in LOD repositories Table 13 lists re-sponse rates of the two methods The rate measuresthe ratio of non-zero result sets among all result setsThroughout the domains recommendation lists result-ing from expressive filtering achieved higher responserates

Table 13Response Rates (TC2)

Domain Constr-based(Regular)

Constr-based(Expanded)

N

Travel 23 57 103

Movie 86 88 50

Music 72 75 53

Book 73 76 51

22 L Wenige et al Similarity-based Graph Queries

000

025

050

075

100

Digital Library Travel Movie Music Book

Domain

Per

cent

age

of A

gree

men

t

Degree of Agreement

1

2

3

4

5

The filter has improved the recommendations of the list

Fig 8 Participantsrsquo agreement with the statement that the filter has improved the result list (domain-wise) (TC2)

In addition to response rates user assessments andscores in the defined quality dimensions were com-pared Table 14 depicts the outcome of this analysisSignificantly higher scores are marked in bold figuresIn case statistical tests identified no meaningful varia-tions the results of the better performing approach areunderlined A-levels were adjusted with the false dis-covery rate (FDR) in order to decrease the probabilityof making a type I error due to multiple testing

According to mean scores expressive requestsgenerated more comprehensive recommendation lists(size) This finding is in line with the increased re-sponse rates shown in Table 13 However subse-quently conducted t-tests confirmed significant dif-ferences only for the travel experiment (t(102) =minus789 p lt 0001) In contrast precision scores werehigher in most of the domains (travel movie mu-sic) when the SKOSRec engine processed a simpleconstraint-based query However statistical tests didnot confirm any systematic differences The authorsapplied Wilcoxon tests for mrs scores and the remain-ing performance metrics because of the small samplesizes resulting from the low response rates of the sim-ple constraint-based approachRegarding Novelty none of the approaches was supe-rior In the Diversity dimension expressive filters gen-erated suggestions that were at least as topically di-versified as the results from simple constraint-basedretrieval Even though no significant differences were

identified between the two approaches increased meanscores (divU in the travel domain and divC in each do-main) indicate a slight superiority of expanded filterrequests in this quality dimension The same appliesto usefulness scores (mean values were higher in 3 outof 4 domains when expressive filtering was applied)However these statements are speculative at best be-cause statistical tests were not significantIn summary it can be concluded that expressive fil-ters improve recall potentially leading to a slight lossin precision scores (see Table 14) It may also be thecase that expanded user constraints diversify as wellas increase the usefulness of recommendation listsHowever these claims are unverified In contrast itis safe to say that expressive query patterns generateresults of at least the same quality as simple patternswhile increasing the number of relevant suggestionsThe few differences may be explained by the high Jac-card scores (JI) (see Table 15) of recommendation listsThroughout the domains result sets were much alikeGiven these similarities and the increases in list sizesfor expressive filters it is supposed that an expandedconstraint produces an enhanced version of the recom-mendation list resulting from simple filtering There-fore expanded filters should be the default setting in aconstraint-based retrieval context

In TC3 (ie evaluation of roll-up query patterns)the SKOSRec engine almost always generated non-empty recommendation lists (see Table 16) In the

L Wenige et al Similarity-based Graph Queries 23

Table 14Performance results in TC2 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Simple Expressive N

M SD M SD

Accuracy (size) [10]

Travel 117 270 452 462 103Movie 672 408 720 392 50Music 596 464 636 451 53Book 459 450 500 451 51

Accuracy (mrs) [1]

Travel 6673 2051 6534 2236 23Movie 6095 2298 6029 2279 43Music 6377 1605 6296 1644 34Book 5617 1818 5681 1681 37

Novelty (nv) [1]

Travel 048 041 061 049 21Movie 044 036 055 080 42Music 055 039 052 037 38Book 077 034 076 033 33

Diversity (divU) [5]

Travel 3 - 35 - 24Movie 4 - 4 - 43Music 4 - 4 - 38Book 3 - 3 - 37

Diversity (divC) [1]

Travel 068 018 072 021 19Movie 070 024 073 023 41Music 086 013 087 012 34Book 084 024 095 063 29

Usefulness [5]

Travel 333 093 350 077 24Movie 332 085 331 086 43Music 355 081 364 078 38Book 339 083 362 121 37

Table 15Jaccard Indices (TC2)

Domain Mean Jaccard Index (JI) SD N

Travel 057 047 25

Movie 071 038 43

Music 082 035 37

Book 092 026 35

travel and the book experiment the response rate washigher for aggregation-based queries whereas in themovie domain the engine more often provided recom-mendations for regular requests However these dif-ferences are only marginal and do not indicate a clearsuperiority of one method

Fig 9 depicts participants general agreement to Lik-ert statements concerning the Perceived Usefulness ofthe approaches in TC3 The diagram shows similarresults for regular as well as advanced requests Onaverage satisfaction scores reached levels which layslightly above neutral agreement A subsequent t-test

Table 16Response Rates (TC3)

Domain Regular Aggr-based N

Travel 99 100 103

Movie 96 92 50

Music 100 100 53

Book 96 100 51

confirmed no significant differences between the twomethods In fact the approach with the highest aver-age score of Perceived Usefulness was different in eachexperiment While mean values were higher for regu-lar requests in the studies on travel and music RS ad-vanced queries produced higher scores in the movieand book experiments

Content-based Diversity (divC) was the only met-ric with systematic differences Here scores went upfor advanced retrieval patterns in the travel (t(98) =minus550 p lt 0001) the music (t(52) = minus451 p lt

0001) and the book domain (t(47) = minus490 p lt

24 L Wenige et al Similarity-based Graph Queries

0

1

2

3

Travel Movie Music Book

Domain

Agr

eem

ent

Method

Regular

Rollminusup

Perceived Usefulness (Regular vs Rollminusup Queries)

Fig 9 Participantrsquos overall satisfaction with regular and roll-up recommendation retrieval (TC3)

0001) (see Table 17) This finding is not surprisinggiven that the aggregation-based approach explores thewider semantic network of each item in the profile

Hence it can be assumed that both approaches pro-duce recommendation lists of similar quality and canprovide benefits to potential consumers Thus it can-not be clearly stated which approach is superior be-cause neither of them outperformed the other in eachdomain While the regular retrieval strategy was bet-ter in terms of usefulness in the travel and the mu-sic domains the aggregation-based approach achievedhigher scores in the domains of movie and book rec-ommendationsThis conclusion is especially impressive given thelow mean Jaccard scores for recommendation lists re-sulting from regular and aggregation-based retrieval(see Table 18) The mean score never exceeded 01

throughout the domains It indicates a low concor-dance between result sets It is remarkable that par-ticipants perceived the recommendations as equallygood given the high dissimilarity of the lists Thusadvanced queries have an added-value as they provideadditional interesting results They can help to exploreLOD repositories in different ways than a regular re-trieval approach Therefore it is worthwhile to offeraggregation-based query patterns as an alternative re-trieval strategy when the regular approach has not pro-duced any helpful recommendations

In addition to TC3 user assessments for TC4 (cross-domain recommendations) were also evaluated How-ever since the authors conducted experiments for TC4only in the multimedia domains the samples were con-siderably smaller Nevertheless it is assumed that theamount of completed user sessions is only just enoughto conduct further statistical analyses Table 19 showsthe response rates for the two retrieval methods Cross-domain patterns had a higher success rate of generat-ing non-zero result sets throughout the domains Par-ticipants almost always received a recommendation forthis query type whereas in case a regular SPARQLquery was executed often they did not receive a singlesuggestion at all

We applied non-parametric tests for the perfor-mance metrics mrs nv divU divC and PerceivedUsefulness due to the low response rates of theSPARQL-based approach which led to a low num-ber of comparable data points accordingly Overall theFDR-adjusted pairwise Wilcoxon tests detected no sig-nificant differences except for user-based Diversityscores (divU) in the movie (Z = minus269 p lt 005) andthe book domain (Z = minus256 p lt 005) The resultsfor the content-based Diversity (divC) metric seem topoint in the same direction because of the high meanvalues for cross-domain queries in each domain Butthe significance of these results was not verified Themrs scores indicate a better performance of regular

L Wenige et al Similarity-based Graph Queries 25

Table 17Performance results in TC3 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular Aggr-based N

M SD M SD

Accuracy (size) [10]

Travel 990 099 942 207 103Movie 288 059 276 082 50Music 1000 000 1000 000 53Book 288 059 300 000 51

Accuracy (mrs) [1]

Travel 4821 2498 4493 2559 102Movie 5458 2288 5669 2542 44Music 5395 1831 5824 1619 52Book 5153 2390 4925 2230 49

Novelty (nv) [1]

Travel 043 039 038 040 98Movie 035 038 028 041 41Music 049 036 046 039 50Book 061 042 062 044 47

Diversity (divU) [5]

Travel 35 - 3 - 100Movie 4 - 3 - 43Music 4 - 3 - 50Book 3 - 3 - 48

Diversity (divC) [1]

Travel 079 017 089 013 99Movie 075 020 066 027 44Music 083 017 093 007 53Book 074 027 093 006 48

Usefulness [5]

Travel 336 077 322 074 101Movie 320 092 326 094 44Music 345 084 344 085 51Book 295 087 303 096 48

Table 18Jaccard Indices (TC3)

Domain Mean Jaccard Index (JI) SD N

Travel 007 011 101

Movie 003 007 44

Music 002 005 53

Book 002 006 50

Table 19Response rates (TC4)

Domain Regular (SPARQL) SKOSRec N

Movie 32 96 50

Music 62 98 53

Book 53 100 51

SPARQL queries but the within-subjects test did notconfirm this assumption statistically (see Table 20)

However the most important finding of the anal-ysis was that SKOSRec cross-domain queries gener-ated significantly more suggestions throughout the do-

mains (movie t(49) = minus1792 p lt 0001 mu-sic t(52) = minus2573 p lt 0001 book t(50) =minus1275 p lt 0001) (see Table 20) Fig 10 illustratesthis graphically

Whenever the SPARQL request provided recom-mendations which was the case in less than half ofthe user sessions (see Table 19) the SKOSRec sug-gestions were assessed to be of almost equal qualityas the SPARQL recommendations (see Table 20 Use-fulness) Hence when SPARQL querying did not pro-duce any results in more than the other half of the testcases the SKOSRec approach may still have provideduseful suggestions

Therefore it is reasonable to assume that the capa-bility of the SKOSRec engine to flexibly switch be-tween processing steps of similarity calculation andpattern matching can facilitate an improved explo-ration of LOD repositories Table 21 also shows thatthe items in the two sets often did not match as isindicated by low Jaccard indices This finding furtherdemonstrates that the execution of an additional step

26 L Wenige et al Similarity-based Graph Queries

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

00

25

50

75

100

Movie Music Book

Domain

No

of i

tem

s

Method

Regular

CrossminusDomain

Recommendation List Size

Fig 10 Mean recommendation list sizes for SPARQL and SKOSRec cross-domain requests (TC4)

L Wenige et al Similarity-based Graph Queries 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

of similarity calculation can help to retrieve other rel-evant items

5 Future Work amp Conclusion

This paper has presented the query-based SKOS-Rec engine that facilitates new types of recommen-dation requests for LOD repositories The evaluationshave shown that the SKOSRec system can often gener-ate relevant and useful suggestions when certain querytemplates are utilized In many cases the applicationof expressive or advanced query patterns (ie in thecontext of constraint-based retrieval or cross-domainrequests) helped to significantly improve recall valueswhile still providing the same level of quality in theremaining performance dimensions Additionally ad-vanced recommendation requests can be used to gener-ate diversified recommendation lists in case users arenot satisfied with the results of regular recommenda-tion requests (ie roll-up or cross-domain retrieval vsregular requests) While not being superior in each do-main and test case is at least safe to say that the novelrecommendation approaches of the SKOSRec engineconsiderably extend common retrieval methods and atleast provide an alternative search strategy when con-ventional methods fail to provide useful results Addi-tionally the increase in diversity and recall is in linewith findings from previous research on LOD-enabledRS [45]It is a strength of the developed approaches that theyfacilitate combinations of graph-based and similarity-based retrieval at different stages of the recommenda-tion workflow This feature extends general search ca-pabilities for semantic networks It requires future re-search to investigate whether in addition to SKOS fur-ther RDF vocabularies can be used for ad-hoc item-to-item similarity computation to increase the range ofpotential usage scenarios since not every LOD datasetcontains SKOS annotations [53] The application ofadditional properties could also boost the recommen-dation quality of some query patterns (eg such as thequery types used in TC3) In this context it also needsto be determined whether the representation of the user

profile in the SKOSRec query language can be substi-tuted with free-text expressions to enable search-likefunctionalities LOD researchers have already devel-oped engines that can process natural language queriesover RDF data [59] [34] [67] It will have to be inves-tigated how the SKOSRec engine profitably fits intothis landscape of existing retrieval toolsAnother open research question concerns the aspect ofinterfaces to other applications In its standard config-uration the system is not an end-user retrieval enginebut a backend application with which an administra-tor who has domain knowledge (eg of RDF vocab-ularies and data models of a particular LOD reposi-tory and usage scenario) creates the respective querypatterns Although the websites of the online experi-ments represent possible implementations of suitableend-user applications for the SKOSRec engine thereare still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selec-tion of LOD items for the profile was made possibleby previously storing the data in an index that couldthen be accessed and searched before issuing a recom-mendation query However this runs against the ideaof ad-hoc retrieval In this context future research willhave to determine whether it is feasible to generate anon-the-fly mapping from items to the respective LODresources In the case of a commercial application theengine needs to match items with the correspondingentries in the product catalog Even if it is not yet clearwhich UI components can make the best possible useof the SKOSRec enginersquos features the results from theuser experiments can give at least a few suggestionsThe following retrieval options should be the defaultsetting in an interface that contains a LOD-enabledRS component to assist users in finding interestingitems Since the regular approach of on-the-fly re-trieval achieved good results throughout the domainsthe display of simple recommendations generated frompreviously stated user preferences represents a suitablestarting point for retrieval On-the-fly queries could berefined by providing filter options Since the evalua-tions have shown that expressive constraints often leadto increased recall values the application of such afilter might be the best option for assisted retrievalAs in the preparations for the experiments appro-priate graph-based query patterns for the usage sce-nario in question would also have to be determinedby a domain expert before setting up the applicationThe same applies to advanced retrieval patterns whichshould be available to the end user to refine the querywhen both the simple as well as the constraint-based

28 L Wenige et al Similarity-based Graph Queries

approach have failed to provide relevant items Thissuggestion is made because roll-up queries were com-petitive with regular recommendations in the web-based experiments Therefore they represent a viablealternative retrieval strategy For some domains (egfor multimedia retrieval) it is also possible to usean optional selection field to enable the formulationof cross-domain requests Evaluations for this querytype suggest that users might receive interesting results(ie diversified and comprehensive recommendationlists) from such a requestIn summary the contributions of the paper are as fol-lows

ndash Development and implementation of novel re-trieval strategies in the SKOSRec system whichfacilitate personalized requests and leverage thepotential of knowledge graphs in order to addressthe research gap in state-of-the-art recommenda-tion methods with regard to customized queries

ndash Realization of web-based user experiments infour usage scenarios with the following out-comes

bull Successful demonstration that knowledge graphdata is applicable in the context of live recom-mendation scenarios

bull Creation of suitable query patterns that canbe applied to represent individual informationneeds in different application contexts The setof query patterns can be extended with the helpof the SKOSRec query language to meet therequirements of additional usage scenarios

bull Proof that the novel retrieval strategies can besuccessfully applied across various applicationcontexts and recommendation tasks and thatthey outperform conventional content-basedretrieval approaches especially in the perfor-mance dimensions of diversity and recall

The prototypical implementation and the evaluationof the SKOSRec engine in different usage scenarioshave demonstrated that the freely available knowledgesources on the LOD cloud can be successfully appliedto advance state-of-the-art methods in IR and recom-mender systems The proposed personalized and cus-tomized retrieval strategies can leverage the potentialof knowledge graphs by better matching informationneeds of users with the help of semantic search op-tions

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be com-prised of up to six elements of which the partsSimProjection and ItemPart are obligatory Henceusers need to state at least their preferencesand how many recommendations they wouldlike to receive Advanced retrieval patterns canbe formulated as well For instance similar re-source retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (Se-lectPart) of LOD resources As in SPARQL thequery can start with a prologue that abbreviatesIRIs with namespace declarations [21]

SelectPart The SelectPart part enables subqueryingwith recommendation results The surroundingquery is a SELECT query with a slightly simpli-fied WHERE clause (ie RecWhereClause) withwhich postfilter conditions can be formulatedIn this section users have the option to formu-late aggregation-based retrieval requests (Aggre-gation)

Aggregation This query part handles aggregation-based postfiltering Users specify the IRI (IRIref )resource for which similarity scores of sublevelentities are summarized Additionally it is statedhow the data should be aggregated (ie SUMMAX or AVG) as well as which variable (Var)represents the aggregation-based recommenda-tions in the postfilter section The compiler makessure that this variable is actually contained in theRecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refersto the list of generated suggestions It definesthe recommendation variable to which all simi-lar LOD resources are mapped In case pre- andpostfilter conditions are set in the other sectionsthe compiler checks whether the variable (Var)appears in these sections as well Otherwise therequired join operations cannot be executed Itis specified how many recommendations shouldbe displayed (Integer) and whether the requestwill be issued against a default repository or as across-repository request (ServiceIntegration)

ServiceIntegration This section indicates that a userintends to receive recommendations from a dif-ferent SPARQL endpoint than the default end-point that is stated in the standard configurationof the SKOSRec engine It specifies the targetSPARQL endpoint (IRIref ) from which a user

L Wenige et al Similarity-based Graph Queries 29

wants to receive recommendations Upon extrac-tion of SKOS annotations from the default sourceendpoint a subsequent request is sent to the spec-ified target endpoint

ItemPart This section of a SKOSRec query repre-sents a single preference in the user profile How-ever recommendation queries can contain a cou-ple of preference statements A preference is ei-ther expressed as a LOD resource (IRIref ) or asa variable which is bound to a preference query(VarPart) In the latter case the resources areobtained by matching the graph pattern in theVarPart with the triple statements in the reposi-tory The ItemPart statement can be additionallysupplemented with a concept-to-concept similar-ity threshold (Sim) that triggers concept expan-sion with the approach of flexible similarity de-tection which is extensively described in one ofour previous works [64]

VarPart The construct initiates preference queryingIndividual likings are expressed as a graph pat-tern in a RecWhereClause The variable (Var) isa placeholder for the LOD resources that will beused to set up the user profile The compiler hasto make sure that the specified variable occurs inthe RecWhereClause as well Otherwise the ex-traction of LOD resources cannot be carried outcorrectly

Sim The Sim part of a SKOSRec query denotes thethreshold of concept-to-concept scores for flexi-ble similarity detection Even though knowledge-based similarity values usually range between 0and 1 the specification allows a broader range ofdigits in the Decimal datatype to enable applica-tion of other similarity metrics in case they areneeded

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users canstate whether scores should be ldquolargerrdquo ldquolargerthanrdquo or ldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three dif-ferent sections of a SKOSRec query ie either inthe prefilter (ItemPart) the postfilter (SelectPart)or the preference filter section (VarPart) TheRecWhereClause closely resembles the ldquoWhere-Clauserdquo of the SPARQL 11 specification withall its subsequent parts [21] Hence it enablesdifferent combinations of graph pattern match-ing However the RecWhereClause of the SKOS-Rec language allows fewer pattern matching ex-pressions than the ldquoWhereClauserdquo of the regular

SPARQL syntax specification [21] For instanceit is neither permitted to formulate subqueries norto direct filter requests to more than one reposi-tory (see RecGraphPatternNotTriples for furtherexplanations) This modification has been madeto simplify similarity calculation After parsingthis section the compiler makes sure that recom-mendation preference or postfilter variables oc-cur at least once in the RecWhereClause to guar-antee that subsequent steps of the recommenda-tion workflow can be applied on existing LOD re-sources In case the RecWhereClause comprises aRecMinusGraphPattern it also has to be checkedthat the variable is not only contained in theMINUS part of the group

RecGroupGraphPattern A RecGroupGraphPatterncan contain one to many basic graph patternswhich can be differently combined according toRecGroupGraphPatternSub

RecGroupGraphPatternSub This section defines thegraph patterns that are applied to identify suit-able LOD resources User filters can be either ex-pressed as basic graph patterns (TriplesBlock) oras combinations of them (RecGraphPatternNot-Triples) The sequence of different kinds of pat-terns can be flexibly specified

RecGraphPatternNotTriples This query part closelyresembles the similar named ldquoGraphPatternNot-Triplesrdquo of the SPARQL 11 specification [21]The only difference to SPARQL is that the Rec-GraphPatternNotTriples section is a little morerestricted in terms of admissible graph patternsthan the ldquoGraphPatternNotTriplesrdquo section of theregular SPARQL specification For instance itdoes not allow to retrieve LOD resources otherthan from the default graph that is specified in theconfiguration of the SKOSRec engine since filterpatterns have to be applied to datasets that containSKOS annotations Hence they need to be speci-fied before runtime execution to ensure that sim-ilarity calculation can be carried out correctly Ifa user was able to formulate filter conditions thatreferred to different RDF graphs and endpointsaccordingly this would not be possible

RecGroupOrUnionGraphPattern This query sec-tion marks an alternative graph pattern in the styleof the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntaxspecification [21] However it neither allowsSPARQL-like subqueries nor federated queries

30 L Wenige et al Similarity-based Graph Queries

for filtered resources because this would compli-cate similarity calculation

RecOptionalGraphPattern In this part of a SKOS-RecQuery users are enabled to specify additionalgraph patterns which might extend the query so-lution but do not necessarily have to match thedata

RecMinusGraphPattern This section handles exclu-sion of LOD resources for cases when users wantto omit certain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

L Wenige et al Similarity-based Graph Queries 31

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM A

mobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige J Ruhland Flexible on-the-fly recommendationsfrom Linked Open Data repositories In International Confer-ence on Business Information Systems pp 43ndash54 2016

[63] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[64] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[65] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[66] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[67] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Future Work amp Conclusion
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 2: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge … · Semantic Web 0 (0) 1 1 IOS Press Similarity-based Knowledge Graph Queries for Recommendation Retrieval Lisa Wenige*,

2 L Wenige et al Similarity-based Graph Queries

vides the SPARQL Protocol and RDF Query Language(SPARQL) and suitable query engines that enable fastretrieval [55]However graph-based queries alone are not yet suffi-cient to generate personalized suggestions since theyonly return results for graph patterns that exist inthe repository Consider the following example Con-sumers may like to use recommendation engines thatwork on multiple domains simultaneously For in-stance a user who has stated that he likes certainitems from one domain may like to receive suggestionsfrom another domain as well (ie for cross-domain re-trieval) In case a user profile contains feedback infor-mation for books it would be desirable if the systemcould generate movie suggestions based on this dataWhile the approach of querying RDF graphs can beuseful for this recommendation scenario (since it canidentify matching objects based on entity type declara-tions or typed link information) it may also be the casethat there are no direct links from a preferred book toa suitable movie in the RDF graphTherefore RS designers should not rely too heavily ongraph-based queries but also explore possibilities ofsimilarity-based retrieval A possible solution wouldbe to identify books that are similar to the ones inthe user profile In a subsequent processing step andthrough a graph-based query the engine could explorewhether the similar books have any connections tomovie items in the repository This approach can re-veal implicit connections in the data and might help toreturn fewer empty result sets to the user Thereforethe data should be processed in an ad-hoc fashion Thisis because one processing step (identification of suit-able movies through graph pattern matching) relies onthe outcome of a preceding one (computation of sim-ilar book items) For this purpose it is important thatrecommendations can be quickly generated on-the-flywithout further preprocessing Thus graph matchingand similarity-based calculations can be merged Thisalso has to be accompanied by a query language whichfacilitates flexible combinations of the two retrievalparadigms By these means users can specify at eachkey step of a recommendation workflow (ie user pro-file generation and item selection) if either graph- orsimilarity-based processing operations should be trig-geredAlmost none of the existing Linked Data recommendersystems (LDRS) addresses these issues which pre-vents efficient usage of available knowledge sourcesfor retrieval tasks Instead current LDRS rely on afixed set of item features which have to be extracted

from the LOD cloud before suggestions can be gen-erated [12 13 22 24 29 50 65] However the ad-hoc retrieval and processing of item features for sim-ilarity computation in combination with graph-basedqueries can be facilitated by concepts from SimpleKnowledge Organization System (SKOS) vocabular-ies [38] SKOS annotations occur in more than halfof the repositories in the LOD cloud and often de-scribe real-world entities [53] For instance in the DB-pedia repository an LOD resource representing a mu-sic act is characterized by SKOS-based subject de-scriptors (eg stating the genre or the label of the mu-sic act) The descriptors are part of the DBpedia cat-egory graph [11] A recommendation framework thatutilizes SKOS annotations for item-to-item similaritycalculation is applicable to a wide range of data col-lections and domains Additionally SKOS subjects areuniquely identified URI resources Therefore they canbe conveniently processed for ad-hoc querying with-out further preprocessing [38]This paper presents presents novel retrieval strategiesand a SKOS-based recommendation engine calledSKOSRecommender (SKOSRec) that can flexiblycombine on-the-fly similarity calculation and SPARQL-like techniques to leverage the full potential of LODrepositories Their effectiveness have been proven in aseries of user studies In summary the paper addressesthe following key points

ndash Comprehensive literature review of the state-of-the-art in LOD-enabled Information Retrieval(IR) and recommender systems (Sect 2)

ndash Development of search strategies that are applica-ble to numerous LOD collections The novel re-trieval methods are facilitated by

bull Syntax and processing units for a SPARQL-based recommendation query language (Sect3)

bull Flexible combinations of similar resource re-trieval and graph pattern matching at differentstages of the recommendation workflow (egby being able to apply graph-based filter condi-tions on recommendation results or by utilizingrecommendations as part of a SPARQL-basedsubquery) (Subsects 32-35)

ndash Extensive evaluation of the novel retrieval ap-proaches in a series of web-based user experi-ments which have proven the effectiveness of thedeveloped methods especially in terms of diver-sity and recall (Sect 4)

L Wenige et al Similarity-based Graph Queries 3

2 Related Work

Effective answering of retrieval requests has beeninvestigated by researchers for many years Even be-fore the evolution of the LOD cloud scientists de-veloped strategies for extracting relevant informationfrom text data The most prevalent IR systems performrelevance rankings for unstructured resources and of-fer means to state filter conditions that complementqueries in a faceted search interface The approachescan combine query constraints with similarity-basedretrieval In these systems user requests are quicklyanswered due to extensive preprocessing and index-ing of resources before runtime execution [58] How-ever low latency of search results comes at the cost ofa hard-wired data model which prevents flexible cus-tomizations and causes limitations in terms of ad-hocretrieval [18]Other researchers have developed query-based RS fortable data For instance Adomavicius et al [2] proposethe REQUEST query language that generates sugges-tions from a relational database which contains infor-mation on both user profiles and items Thus personal-ized recommendation queries can be formulated An-other interesting approach in the category of query-based RS is the FlexRecs system by Koutrika et al[32] The key advantage of their course RS is that itenables highly individual requests in which differentquery parts (ie profile generation filtering of rec-ommendations or like-minded peers) can be flexiblycombined into so called recommendation workflowsHence highly individual recommendation queries canbe formulated with FlexRecsDespite the strengths and weaknesses of the discussedapproaches it has to be taken into account that theyare not designed to consume LOD LOD-enabled sys-tems on the other hand use RDF data for retrievalThere exist many index-based Linked Data IR systems[4 9 10 20 25 40 46 47 61] However they indexRDF resources before runtime retrieval and can there-fore neither facilitate on-the-fly similarity calculationnor do they provide extensive SPARQL-based queryoptionsAnother interesting engine category are on-the-flyLinked Data IR systems These systems process in-formation from the LOD cloud through spreading ac-tivation algorithms While they enable fast retrievalthe downside of these approaches is that graph-basedquery constraints are only possible to a limited ex-tent due to the applied probabilistic path traversal tech-niques [16 17 28 33 35 56]

Existing LDRS on the other hand mostly performoffline computation of item-to-item similarities fromLOD metadata descriptions [13 22 24 29 50 65]While this approach offers more control over the re-trieval process it prevents individual customizationsand runtime responsesTo this date there are only a few query-based LDRS[5 43 49] Most of these systems rely on the assump-tion that user preferences and item metadata reside inthe same repository But this goes against the notion ofthe LOD cloud as a decentralized and openly accessi-ble data space which can be queried through API-likeinterfaces Additionally while query-based LDRS pro-vide SPARQL-based query options to filter recommen-dation results they are not capable to flexibly combinesimilar resource retrieval with graph filters (eg sub-querying with recommendation results)In summary existing non-Linked Data as well asLinked Data-enabled retrieval and RS engines alreadyprovide useful features for personalized resource ac-cess Nevertheless the full potential of LOD for rec-ommendation and search tasks has yet to be realized(see Table 1) The novel query strategies of the SKOS-Rec engine are attempts to tackle the previously out-lined research gaps to improve existing RS as well asto offer new search strategies for LOD repositories thatare guided by personal preferences

3 The SKOS Recommender

31 System Overview

The SKOSRecommender is designed to consumerecommendation requests and was developed with thehelp of the Apache Jena framework1 A SKOSRecquery triggers a recommendation workflow in whichsimilarity-based retrieval steps can be joined withSPARQL-like filter expressions Listing 1 shows thesyntax of the SKOSRec query language that needs tobe applied to formulate recommendation requests

In the given grammar underlined parts representSPARQL syntax elements that have been directly takenfrom the latest W3C specification [21] Words in cap-ital letters denote query keywords which are alsoSPARQL expressions except for the ones listed in thelanguage parts SimProjection and ServiceIntegrationAs a regular SPARQL SELECT query a SKOSRecquery always generates a table-like result set that con-

1httpsjenaapacheorg

4 L Wenige et al Similarity-based Graph Queries

Table 1Summary of existing retrieval and recommendation approaches

System Type LOD-enabled A SPARQL-based queries B On-the-fly similarity C Flexible combination ofA and B

IR systems x x x x

Query-based RS x x X (X)

Index-based LinkedData IR systems

X x x x

On-the-fly LinkedData IR systems

X x X x

LDRS X x x x

Query-based LDRS X X x x

Listing 1 Grammar of the SKOSRec query languageSKOSRecQuery = Prologue S e l e c t P a r t S i m P r o j e c t i o n PREF I t e m P a r t + RecWhereClause S e l e c t P a r t = SELECT ( DISTINCT | REDUCED) Var+ RecWhereClause

LimitClause A g g r e g a t i o n A g g r e g a t i o n = AGG IRIref Var (SUM | MAX | AVG )S i m P r o j e c t i o n = RECOMMEND Var TOP INTEGER S e r v i c e I n t e g r a t i o n S e r v i c e I n t e g r a t i o n = FROM SERVICE IRIREFI t e m P a r t = ( V a r P a r t | IRI ) SimV a r P a r t = [ Var RecWhereClause ]Sim = SIM R e l a t i o n DECIMALR e l a t i o n = ( gt | gt= | = )RecWhereClause = WHERE RecGroupGraphPa t t e rnRecGroupGraphPa t t e rn = rsquo rsquo RecGroupGraphPa t t e rnSub rsquo rsquoRecGroupGraphPa t t e rnSub = TriplesBlock ( R e c G r a p h P a t t e r n N o t T r i p l e s rsquo rsquo TriplesBlock)lowastR e c G r a p h P a t t e r n N o t T r i p l e s = RecGroupOrUnionGraphPat te rn | R e c O p t i o n a l G r a p h P a t t e r n |

RecMinusGraphPa t t e rn | Filter | Bind | InlineDataRecGroupOrUnionGraphPa t te rn = RecGroupGraphPa t t e rn ( rsquoUNIONrsquo RecGroupGraphPa t t e rn )lowastR e c O p t i o n a l G r a p h P a t t e r n = rsquoOPTIONALrsquo RecGroupGraphPa t t e rnRecMinusGraphPa t t e rn = rsquoMINUSrsquo RecGroupGraphPa t t e rn

tains at least a mapping to a single variable A detailedexplanation of each part of the SKOSRec grammar canbe found in the Appendix A of this paperFig 1 depicts the overall architecture of the systemwith all its components The most important parts arethe SKOSRec query processing units (ie the parserand the compiler) They check the syntax of recom-mendation requests and regulate the correct execu-tion of critical operations of the engine (eg graph-based filtering result set joins similarity calculation)The architecture does not dictate a particular SPARQLserver implementation for LOD repositories since thesystem is decoupled from the endpoint that is accessedover HTTP during retrieval However the prototypi-cal implementation of the SKOSRec engine utilizes theOpenLink Virtuoso server2

Fig 1 also shows the sequential order of a typicalrecommendation workflow for a complex request thatconsists of the following retrieval steps

2httpsvirtuosoopenlinkswcom

1 Issuing a SKOSRec query2 Parsing the SKOSRec query3 Compiling the SKOSRec query4 Loading information on the LOD repositoryS-

PARQL endpoint to be queried from the config-uration

5 Retrieval of preferred items and setting up of theuser profile (see Subsect 33 Preference Query-ing)

6 Retrieval and processing of relevant LOD re-sources upon application of prefilter conditions(see Subsect 34 Prefiltering)

7 Optimizing the retrieval process through identi-fying the resources that can be omitted for thefinal ranking3

8 Determination of similar LOD resources basedon the user profile (see Subsect 32 On-the-FlyRecommendations

3 For further information on optimized retrieval please refer to[62]

L Wenige et al Similarity-based Graph Queries 5

Fig 1 Architecture and workflow of the SKOSRec engine

9 Ranking of LOD resources10 Sending of recommendation results to the com-

piler and joining them with postfilter constraints11 Retrieval of the final result set (see Subsect 35

Postfiltering amp Combinations)12 Output to the user

32 On-the-Fly Recommendations

A simple on-the-fly request is the most basic form ofa SKOSRec query It is based on the enginersquos ability togenerate ad-hoc recommendations from a user profilecontaining preference statements for LOD resourcesListing 2 depicts an example SKOSRec on-the-fly re-quest that obtains suggestions based on a userrsquos pref-erence for the western movie ldquoThey Call Me Trin-ityrdquo In the given query the preferred movie is rep-resented by the DBpedia resource (dbrThey_Call_Me_Trinity) The shown query also contains a prefixwith a namespace declaration and a specification re-garding the number of suggestions the engine shouldgenerate

Listing 2 A simple recommendation query for themovie domain (Q1)

PREFIX dbr lthttpdbpediaorgresourcegt

RECOMMEND movie TOP 5PREF dbrThey_Call_Me_Trinity

In this basic form the query does neither con-tain any pre- or postfilter conditions nor a preferencequery statement Thus the engine simply identifies allmovies that are similar to the preference in the profileWhen executed over DBpedia the query returns thesolution set that is depicted in Table 2

Table 2Result set for Q1

moviedbrGod_Forgivesrsquo_I_DonrsquotdbrAce_High_(1968_film)dbrBoot_Hill_(film)dbrTrinity_Is_Still_My_NamedbrTroublemakers_(1994_film)

For determining recommendations in an ad-hocfashion the system identifies SKOS annotations bymatching a suitable property (ie annotation property)in an RDF dataset (Definition 1)

6 L Wenige et al Similarity-based Graph Queries

Definition 1 (Annotation property) An annotationproperty is an IRI (Listing 3) that is defined in theDublin Core Metadata Initiative (DCMI) specificationfor subject annotations It can occur in conjunctionwith one of the two namespaces which are stated bythe standard [15]

Listing 3 Annotation property

ltANNOTPROPgt = dcsubject | dctsubject

These properties help to retrieve SKOS annotationtriples for items preferred by a user (Definition 2)

Definition 2 (SKOS annotation triple) A SKOS anno-tation triple (st) is a triple that is comprised of an IRIin the subject position an annotation property (AN-NOTPROP) as predicate and a concept c in the objectposition The concept is part of a knowledge organiza-tion system S in SKOS format (C = c | c isin S ) (Eq1)

st isin I times ltANNOTPROPgttimesC (1)

The identification of SKOS annotation triples facil-itates a fast retrieval of potential similar items There-fore the engine evaluates shared features between re-sources from the SKOS annotation dataset (Definition3)

Definition 3 (SKOS annotation dataset) A SKOS an-notation dataset (AD) is a subset of an RDF datasetthat contains SKOS annotation triples (Eq 2)

AD sub I times ltANNOTPROPgttimes C (2)

Before the similarity calculation process starts theengine determines SKOS annotations for each LODresource (r) from the user profile by issuing a sub-sequent SPARQL query to the LOD repository fromwhich the user intends to receive suggestions The no-tion of SKOS annotations is specified in Definition 4

Definition 4 (SKOS annotations) In the annotationdataset LOD resources directly link to concepts ofa SKOS system via a predefined annotation propertySKOS annotations for an input resource r are definedas in Equation 3

Annot(r) = c isin S | exist(rltANNOTPROPgt c) isin AD (3)

Afterwards the engine identifies which of the otherresources in the dataset shares annotations with itSimilarities only have to be computed for items withmatching concepts [13 52] Therefore triples canbe efficiently joined along item features since nativeRDF storage systems usually index triples rather thancolumns [41] Thus the quick identification of mu-tual annotations through SPARQL requests facilitatesad-hoc similarity calculation Definition 5 specifies thenotion of relevant resources and their annotations thatare needed to identify similar items The applied syn-tax follows Peacuterez et al [42]

Definition 5 (Relevant resources and their annota-tions) The conjunctive graph pattern Pr matches allitems and subjects that are potentially relevant for rec-ommendation retrieval (Eq 4)

Pr = (rltANNOTPROPgt c) AND (qltANNOTPROPgt c)

(4)

The mapping Ωr of relevant resources and theirSKOS annotations is obtained by retrieving all re-sources q that share at least one SKOS concept c withresource r from AD (Eq 5) The corresponding resultset contains mappings for all variables (ie c and q)in the triple statements of Pr

Ωr = [[Pr]]AD (5)

Upon extraction of relevant resources and annota-tions the process of similarity calculation can startThe engine determines the information content (IC) ofthe shared SKOS annotations of two resources Thisapproach is based on the hybrid similarity metric pro-posed by Meymandpour and Davis [37] for LOD-enabled RS However while their measure considersall types of IRI resources for the retrieval our similar-ity metric purely relies on SKOS annotations (Defini-tion 6)

Definition 6 (SKOS similarity) Let Annot(r) be theset of SKOS features of resource r and Annot(q) the setof SKOS features of resource q and q isin micro(q) | micro isinΩr then their similarity can be derived from the IC oftheir shared concepts Cshared = Annot(r) cap Annot(q)

sim(r q) = IC(Cshared) (6)

L Wenige et al Similarity-based Graph Queries 7

The IC of a set of SKOS concepts is measured bythe sum of the inverse logarithms of each conceptrsquos fre-quency in relation to the maximum frequency of oc-currences among relevant resources (Definition 7)

Definition 7 (SKOS Information Content) The SKOSIC is defined as an aggregation of the individual ICvalues of each concept that is contained in the setof shared features (c isin Cshared) where f req(c) isthe frequency of occurrences of c among all rele-vant resources and n is the maximum frequency of oc-currences among these resources The final similarityscore of two resources r and q is determined by sum-mating the IC values of each concept of the shared fea-ture set

IC(Cshared) = minussum

cisinCshared

log(

f req(c)

n

)(7)

After having obtained resource annotations as wellas IC values similarity scores between the input re-source r and each potentially relevant resource q canbe calculated When a user profile contains more thana single item similarity values are aggregated throughsummation to determine the final recommendationscore (Definition 8)

Definition 8 (Recommendation score) The recom-mendation score of a potentially relevant resource q isquantified by the sum of similarities with each resourcer that can be found in the profile (Pr)

score(Pr q) =sumrisinPr

sim(r q) (8)

Upon calculation of similarity values for each po-tentially relevant item q the engine ranks the retrievedresources based on a given limit (k) that is specified bythe user in the SKOSRec queryWith the presented methods of on-the-fly retrieval rec-ommendations can be generated from LOD reposito-ries through an API interface without requiring lo-cal metadata or excessive preprocessing operationsother than the mapping of profile items with LOD re-sources4

4Di Noia et al describe how a mapping procedure can beconducted for a real-world implementation For their LDRS theymatched movie titles with the corresponding LOD resources in DB-pedia The authors extracted IRIs labels and publication dates forall movie resources They applied the Levenshtein distance on theseresources to identify matching movies items [13]

33 Preference Querying

In the previous subsection it was assumed that usersexpress their tastes in the form of preference state-ments for a certain item in the form of an IRI refer-ring to a LOD resource However it may also be thecase that likings are only vaguely known for instancewhen users prefer products with specific features butare not able or willing to specify the actual itemsThe engine obtains such a profile by issuing a sub-sequent SPARQL query that contains the preferencegraph pattern (Ppre f ) in the WHERE condition Sup-pose a user is interested in Quentin Tarantino movies(dbrQuentin_Tarantino) but does not specify theprecise items she likes He could send a preferencequery to the SKOSRec engine The graph pattern inthe request would match all movies that are con-nected to dbrQuentin_Tarantino with the prop-erties dbodirector or dboproducer (see Listing7)

Listing 4 SKOSRec query to generate recommenda-tions based on a preference query in the movie domain(Q2)

PREFIX dbr lthttpdbpediaorgresourcegt PREFIX dbo lthttpdbpediaorgontologygt

RECOMMEND movie TOP 5PREF [ prefMovieWHERE prefMovie dbodirector dbrQuentin_Tarantino UNIONprefMovie dboproducer dbrQuentin_Tarantino ]

Table 35 shows the corresponding solution set tothe preference query containing films from DBpediathat are similar to Quentin Tarantino movies

Table 3Result set for Q2

moviedbrThe_Godfather_Part_IIdbrThe_Prestige_(film)dbrThe_Dark_KnightdbrPlanet_TerrordbrClerks

The notion of a queried profile is specified in Defi-nition 9

Definition 9 (Queried profile) A queried profile is auser profile that has been compiled by retrieving allLOD resources r that are part of an annotation graph

8 L Wenige et al Similarity-based Graph Queries

AD and are contained in the solution mappings result-ing from the evaluation of a specified graph pattern(Ppre f ) over a dataset D (Eq 9)

Pr = micro(r) | micro isin [[Ppre f ]]D r isin dom(micro) r isin AD

(9)

34 Prefiltering

To facilitate exploratory search in LOD reposito-ries users should also have the option to formulateconditions in their queries A possible way to enablefilters during retrieval is before similarity calculationThus recommendation lists can be adjusted to individ-ual needsDue to the expressiveness of RDF data filter condi-tions can not only be applied on attributes that are di-rectly connected to potentially relevant items but canalso be declared as a graph pattern By this means it ispossible to retrieve LOD resources that are both simi-lar to the user profile as well as fulfill an advanced con-ditionThe strengths of the approach will be illustrated with atravel recommendation request that was executed overDBpedia Suppose a customer expressed a preferencefor the Lake Baikal region (eg as indicated by a pos-itive rating on a travel community site) With simpleon-the-fly retrieval she would receive a recommen-dation list that primarily consists of travel destinationsthat are located in the proximity to this geographicalarea A graph-based filter on the other hand facili-tates advanced retrieval options In the given exampleit enables the user to obtain suggestions for SoutheastAsian travel destinations that have similar features asthe Lake Baikal region even though the resource dbrLake_Baikal is not directly connected to any desti-nation with the annotation dbcSoutheast_Asia butcan only be identified by exploring the surroundinggraph structure of the filter attributeThe advanced constraint ensures that the generatedresult set is not empty A simple attribute-level fil-ter which is commonly utilized in faceted search sys-tems would not generate any search hits By thesemeans graph-based filter patterns can help users tobetter explore knowledge graphs thereby overcomingdata quality issues in LOD repositories The corre-sponding SKOSRec query to this example is shown inListing 5

Listing 5 SKOSRec query with an expressive pre-filter condition to retrieve Southeast Asian destinationsbased on a preference for the Lake Baikal region (Q3)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbc lthttpdbpediaorgresourceCategorygtPREFIX dbo lthttpdbpediaorgontologygtPREFIX dct lthttppurlorgdctermsgtPREFIX skos lthttpwwww3org200402skoscoregt

RECOMMEND destination TOP 5PREF dbrLake_Baikal[ WHERE destination dbocountry country

country dctsubject countrySubjectcountrySubject skosbroader dbcSoutheast_Asiadestination rdftype dboPlace

]

Table 34 shows the solution to this query It con-tains travel destinations and regions of interest thatare both similar to the previously liked destinationdbrLake_Baikal and are located in Southeast AsiaEcoregions around lakes rivers or mountains fromSoutheast Asian countries such as Cambodia Vietnamor Malaysia are presented to the user

Table 4Result set of Q3

destinationdbrTonle_SapdbrMekongdbrMount_KinabaludbrLaguna_de_BaydbrLake_Toba

The technical details for this type of query are givenin the following definitions Before the extraction ofrelevant resources and annotations the recommenda-tion engine identifies the resources that satisfy the fil-ter (Definition 10)

Definition 10 (Prefiltered resources) The mapping offiltered resources Ωpre f ilter is obtained by matching aspecified graph pattern (P) to the RDF graph of adataset D (Eq 10)

Ωpre f ilter = micro|q | micro isin [[P]]D (10)

After preselection resources are brought togetherwith user profile data to retrieve the final set of recom-mendations (Definition 11)

Definition 11 (Annotations and relevant resources(prefiltered retrieval)) The solution mappings of an-notations and relevant resources in prefiltered retrievalmode is obtained by joining the set of prefiltered re-

L Wenige et al Similarity-based Graph Queries 9

sources with the mappings Ωr of all relevant resourcesand annotations as determined from the user profile

Ωpre = Ωpre f ilter Ωr (11)

After retrieving annotations and relevant resourcesin prefiltering mode the system generates constraint-based recommendations In this context the rankingprocedure is adapted to the restricted set of relevantresources and annotations Hence IC values of match-ing annotations and respective SKOS similarities arequantified according to the user filter By this meansrecommendation scores reflect the similarity of itemsfor the actual set of filtered LOD resources Except forthis restriction the system carries out the calculations(ie determination of similarity values the ranking ofLOD resources) in the same way as in non-filteringmode

35 Postfiltering amp Combinations

The SKOSRec engine cannot only combine simi-larity calculation and graph pattern matching to pre-filter relevant LOD resources it can also apply userconstraints after similar items have already been de-termined By this means recommendations are filteredex-post to the similarity detection process This fea-ture enables novel retrieval patterns in which sugges-tions are part of a subquery As an example scenariofor a postfilter recommendation task consider the fol-lowing SKOSRec query which obtains suggestions formovies and directors based on a userrsquos preference forthe director dbrQuentin_Tarantino (see Listing6) For the given request the engine would identifysimilar directors based on Quentin Tarantinorsquos char-acteristic features (eg geographic origin or awardswon) and would determine the movies these similar di-rectors have shot Thus users may receive interestingmovie recommendations

Table 5 shows the solution the SKOSRec system re-turns when Q4 is issued against DBpedia The ellipsesin the displayed tables indicate records that have beenexcluded for brevity reasons

However this approach can still be improved uponFor instance the engine only considers the metadatadescriptions of the director whereas related movieshave to be retrieved anew upon execution of the post-filter section Thus they are not ranked according tosimilarity in the final output tableAnother weakness is the biased semantics of the query

Listing 6 SKOSRec query to generate simple post-filtered recommendations in the movie domain (Q4)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX yago lthttpdbpediaorgclassyagogt

SELECT director movieWHERE movie dbodirector director director rdftype yagoDirector110014939 LIMIT 50RECOMMEND director TOP 5PREF dbrQuentin_Tarantino

Table 5Result set of Q4

director moviedbrKirk_Douglas dbrScalawag_(film)dbrKirk_Douglas dbrPosse_(1975_film)dbrKevin_Costner dbrThe_Postman_(film)dbrKevin_Costner dbrDances_with_Wolves dbrGene_Kelly dbrHello_Dolly_(film)dbrGene_Kelly dbrInvitation_to_the_Dance_(film) dbrAlan_Alda dbrGoodbye_Farewell_and_AmendbrAlan_Alda dbrMargaretrsquo_s_Engagement dbrSteve_Buscemi dbrInterview_(2007_film)dbrSteve_Buscemi dbrAnimal_Factory

Usually when a person favors a movie director thispreference refers to the movies this director has shotand not to the personal features of the person Two di-rectors might be similar according to certain featuressuch as birth year nationality or received awards butthe style and genre of the movies they have created canstill be different A third factor is that recommenda-tion retrieval for movie directors should also take intoaccount that a film artist is likely more relevant themore movies she has shot that are in line with the pre-ferred movies of the user Hence the SKOSRec engineenables another postfiltering strategy which is calledaggregation-based retrieval This strategy can be help-ful whenever an entity type is linked to a couple ofLOD resources in a dataset such that similarity scoresof related entities (eg movies) can be aggregated toan overall recommendation score for a certain item(eg for a movie director)Aggregation-based requests can be well combinedwith other retrieval patterns such as preference queries

10 L Wenige et al Similarity-based Graph Queries

In the case of the movie recommendation scenario auser could state that she enjoyed Quentin Tarantinomovies in the past and would like to receive sugges-tions for directors that shot similar film Usually thiskind of question can typically only be answered byanother human being eg in a personal interactionamong friends or in an online forum The fact that thelanguage facilitates these kinds of queries is one of itskey advantages

Listing 7 SKOSRec query to generate post-filteredrecommendations based on a preference query in themovie domain (Q5)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX yago lthttpdbpediaorgclassyagogt

SELECT director movieWHERE movie dbodirector director director rdftype yagoDirector110014939 LIMIT 50AGG dbrQuentin_Tarantino director SUMRECOMMEND movie TOP 100PREF [ prefMovieWHERE prefMovie dbodirector dbrQuentin_Tarantino ]

Table 6 depicts the results of Q5 as retrieved fromDBpedia When comparing the solutions sets for Q4and Q5 it becomes clear that they are totally dif-ferent While Q5 generated a list of directors whohave mostly created action and thriller movies with atwist (ie Tarantinorsquos unique filmmaking style5) Q4provided a completely different output It shows USAmerican actor-directors whose movies range frommusical films (dbrHello_Dolly_(film)) towestern movies (dbrDances_with_Wolves) ordrama films (dbrInterview_(2007_film))Thus the directors in the solution table of Q4 (Table5) seem rather arbitrary and not exceptionally helpfulconcerning the initial user request

We define this kind of aggregation-based request asa so-called rollup query This definition was chosen be-cause the final suggestions are based on summarized(ie rolled up) preferences for sublevel entities An-other example for a roll-up request stems from the ap-

5httpsenwikipediaorgwikiQuentin_Tarantino

Table 6Result set of Q5

director moviedbrSteven_Spielberg dbrIndiana_JonesdbrSteven_Spielberg dbrJurrasic_Park_(film) dbrFrancis_Ford_Coppola dbrThe_Godfather_Part_IIdbrFrancis_Ford_Coppola dbrThe_Godfather_Part_III dbrChristopher_Nolan dbrThe_Prestige_(film)dbrChristopher_Nolan dbrThe_Dark_Knight dbrRobert_Rodriguez dbrPlanet_TerrordbrRobert_Rodriguez dbrMachete_(film) dbrKevin_Smith dbrClerksdbrKevin_Smith dbrClerks_II

plication scenario of travel search This recommenda-tion query derives suggestions for city trip destinationsfrom the points of interest (POI) a user has visited andliked in another city (eg London) Listing 8 showsthe corresponding SKOSRec query for this scenario

Listing 8 SKOSRec query to generate post-filteredrecommendations (aggregation-based) for a preferenceprofile containing the city of London (Q6)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX dullthttpwwwontologydesignpatternsorggtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX rdfslthttpwwww3org200001rdf-schemagt

SELECT cityWHERE poi property city city rdftype dboCity property rdfssubPropertyOf dulhasLocation LIMIT 3AGG dbrLondon city SUMRECOMMEND poi TOP 100PREF dbrOxford_StreetdbrNotting_HilldbrSouth_Bank

The SKOSRec request contains subquery com-mands that refer to the userrsquos preferred POIs It spec-ifies how similarity scores of the generated sugges-tions should be aggregated (ie sum-based) The re-trieval is based on the 100 POIs that are most simi-lar to the ones in the preference section Additionallythe subquery part refers to the DBpedia resource thatrepresents London (AGG dbrLondon city) thusindicating that the list of recommendations should notcontain any POIs in this city Upon processing the de-

L Wenige et al Similarity-based Graph Queries 11

picted query over the DBpedia repository the enginedisplays the three cities for whom the highest aggre-gated scores have been detected (see Listing 7)

Table 7Result set of Q6

citydbrOxforddbrAbingdon-on-ThamesdbrWallingford_Oxford

The motivation for formulating such a query is thatwhen searching for cities preferences might be betterrepresented through the sights a user has visited else-where rather than through a simple on-the-fly requestsolely containing a preference statement for a city inthe user profile sectionBesides the above presented roll-up retrieval patternsthe SKOSRec grammar may facilitate the formula-tion of additional types of advanced recommendationqueries For instance cross-domain queries are an in-teresting additional pattern to be explored for personal-ized retrieval Listing 9 shows an example request thatgenerates movie suggestions based on the preferencefor a music band (ie dbrThe_Jackson_5)

Listing 9 SKOSRec query to generate cross-domainrecommendations (Q7)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgt

SELECT movie participated musicalActWHERE movie rdftype dboFilm VALUES dbobasedOn dbomusicComposerdbogenre dbostarring musicalAct participated movie UNION movie participated musicalAct LIMIT 10AGG dbrThe_Jackson_5 movie MAXRECOMMEND musicalAct TOP 100PREF dbrThe_Jackson_5WHERE VALUES actType dboMusicalArtist dboBand musicalAct rdftype actType

The query contains a prefilter condition that trig-gers retrieval of music artists and bands Afterwardsthe scores of the top 100 suggestions are summarizedwith maximum-based aggregation Thus the highestsimilarity value among the set of music acts deter-

Table 8Result set of Q7

movie musicalActdbrMoonwalker dbrMichael_Jackson dbrThe_Wiz_(film) dbrAshford_amp_SimpsondbrGarfield_Gets_a_Life dbrThe_TemptationsdbrPipe_Dreams_(1976_films) dbrGladys_Knight_amp_the_PipsdbrThe_Jacksons dbrJermanine_Jackson

mines the ranking score of the film Also note be-cause of the aggregation-based retrieval procedureconnections between movies and the LOD resourcedbrThe_Jackson_5 are excluded from similaritycalculation The scenario exemplifies how semanticrelationships can help to generate suggestions for a tar-get domain (ie movies) that are based on a profilecontaining items from a different source domain (iemusic acts) While regular SPARQL queries performexact matching of triple statements the SKOSRec syn-tax facilitates the integration of similarity- and graph-based retrieval to enrich result lists with other poten-tially relevant itemsAs a point of reference consider the SPARQL queryfor the given cross-domain scenario (Listing 10)

Listing 10 SPARQL query to generate cross-domainrecommendations (Q8)

PREFIX dbr lthttpdbpediaorgresourcegt PREFIX dbo lthttpdbpediaorgontologygt PREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgt

SELECT movie musicalActWHERE movie rdftype dboFilm VALUES dbobasedOn dbomusicComposerdbogenre dbostarring dbrThe_Jackson_5 participated movie UNION movie participated dbrThe_Jackson_5 LIMIT 5

The query omits the similarity calculation partThus it only retrieves movies linked to the resourcedbrThe_Jackson_5 By this means the SPARQLresults have a better chance of being related to the ini-tial user profile On the other hand empty result listscan occur when the preferred LOD resource does nothave any direct connections to recommendable itemsThis assumption is backed up by the result lists of thetwo queries (Tables 8 and 9) The SKOSRec query pro-duces a more comprehensive solution than a regularSPARQL query

12 L Wenige et al Similarity-based Graph Queries

Table 9Result set of Q8

movie musicalActdbrThe_Jacksons dbrThe_Jackson_5

The following of definitions will formalize the post-filtered and aggregation-based retrieval forms Defini-tion 12 specifies the notion of SKOSRec recommenda-tions that may potentially undergo postprocessing

Definition 12 (SKOSRec recommendations) SKOS-Rec recommendations (R(Pr)) are the set of sugges-tions generated by processing the solution mappings(Ω) obtained from recommendation retrieval Theycan be a result of simple on-the-fly retrieval prefilter-ing or a result of a combination of these techniques Prrepresents the input profile that contains at least onepreference for an item The cardinality of R(Pr) is theintended number of recommendations (k) that is spec-ified by the user based on which the engine selects theresources with the highest recommendation scores thatare not contained in the profile (Eq 12)

R(Pr) = x | x isin q | score(Pr q) gt kmaxqisinΩ

score(Pr q) q isin Pr

(12)

The approach of subquerying with recommendationresults builds on the idea that SKOSRec suggestionsR(Pr) can be joined with any graph pattern Ppost pastto the process of similarity calculation (Definition 13)It does not make a difference how the engine obtainedthese recommendations

Definition 13 (Postfiltered recommendation mapping)The postfiltered recommendation mapping is obtainedby joining solution mappings generated from SKOS-Rec recommendations with the results of matching apostfilter graph pattern (Ωpost) (Eqs 13 and 14)

Ωpost f ilter = micro | micro isin [[Ppost]]D x isin dom(micro)

(13)

Ωpost = micro|x | x isin R(Pr) Ωpost f ilter (14)

As the prefilter Ppost can be comprised of triplestatements that are eligible in the RecGroupGraph-

Pattern part of the SKOSRec grammar Aggregation-based queries have to be treated differently than a reg-ular postfilter request Apart from excluding sublevelitems of the profile from the set of similar resourcesthe engine should also omit any connections betweenthe preferred superordinate item (a) and similar sub-level entities Additionally the user has to specify thevariable in his profile (eg y) that represents the itemswithin the postfilter graph pattern Ppost By this meansaggregation-based recommendations can be generated(Definition 14)

Definition 14 (Resources for aggregation-based re-trieval) The set of resources for aggregation-based re-trieval is obtained by retrieving LOD resources thathave a connection to a previously generated sugges-tion thereby excluding all resources that are linked tothe specified superordinate item a in the profile (Eq15)

Ragg(a) = micro(y) | micro isin Ωpost y isin dom(micro) a isin micro(y) (15)

For each potential aggregation-based suggestion (y)similarity scores of connected entities (x) have to beconsidered for the final ranking In this context differ-ent approaches to aggregation can be applied In caseswhen both similar items and aggregation-based recom-mendations are entities on comparable levels of gran-ularity then choosing the maximum similarity scoremight be the best way to represent the relatedness ofthe resource to the user profile On the other handwhen postfiltered recommendations are based on sim-ilarity scores of their sublevel entities (eg as in themovie recommendation example) the scores of relatedsuggestions should be aggregated by the sum or theaverage of individual scores

Definition 15 (Aggregation-based ranking score) Theranking score of an LOD resource y that is connectedto a set of recommended items is determined by aggre-gating scores of connected entities either by the max-imum (Eq 16) the sum (Eq 17) or the average (Eq18) of individual scores

scoreaggMAX(y) = maxxisinmicro(xy)

score(Pr x) (16)

scoreaggS UM(y) =sum

xisinmicro(xy)

score(Pr x) (17)

L Wenige et al Similarity-based Graph Queries 13

scoreaggAVG(y) =

sumxisinmicro(xy) score(Pr x)

|x isin micro(x y)|(18)

4 Evaluation of the SKOS Recommender

41 Experimental Setup

The novel recommendation strategies of the SKOS-Rec engine were evaluated in the context of web-basedexperiments for the usage scenarios of travel destina-tion search and multimedia RS The online approachhelped to reach out to more participants with diversedemographic backgrounds which would not have beenpossible in a laboratory experiment Another positiveside-effect of the web-based setting was that partici-pants could test and rate recommendations without be-ing directly observed by an experimenter which mighthave caused biased results otherwise Hence the on-line approach softened common limitations of labora-tory studies [19]The experiments were carried out in the defined usagescenarios with metadata descriptions from DBpediaThe database containing the local mirror of DBpediaran on a virtual server Besides the triple store a webapplication was hosted on the same machine The web-site ran on an Apache Tomcat 7 application server andwas implemented with the help of Java servlet technol-ogy67

We conducted the online experiments between April2016 and January 2017 Upon setting up the web ap-plication a pretest was carried out to control for po-tential pitfalls such as misleading navigation Two testusers ran step by step through the web interface andprovided their feedback Because of their insights weslightly modified the interface (eg through rephras-ing user instructions) to increase the chances of suc-cessful completion In case the SKOSRec engine is ap-plied in a live setting the design of the user interfacewill have to be tested with more users Our experimen-tal series focused on algorithmic performance whichwas tested with numerous participantsSubjects were recruited through the clickworkercomplatform8 Participants received a compensation fee

6httptomcatapacheorg7httpwwworaclecomtechnetworkjavajavaeeservletindexhtml8httpswwwclickworkerde

of 100e or 150e for a completed survey 103 par-ticipants took part in the study on travel destinationsearch In the multimedia domains in total 154 sub-jects evaluated the SKOSRec engine For each partof the evaluation it was ensured that participants hadnot just run through the evaluation screen Wheneverthere appeared to be irregularities in the log files (egthe default settings of the screen were not changed)these assessments were excluded from further analy-sis The survey template started with a consent formIt informed participants about the context and the pro-cedure of the study After having filled in the consentform participants were navigated to the first part of theexperiment It collected data on subjectsrsquo demograph-ics and their consumption and search habits After thepreparation phase participants worked on the first testcase (TC1) in each usage scenario In this test case theperformance of the SKOSRec engine was assessed inthe simple execution mode of on-the-fly recommenda-tions (see Subsect 32) It was done to gather data onthe usefulness of suggestions resulting from a simplecontent-based approach (ie baseline method) withwhich more advanced recommendation strategies werecompared at a later stage In cases where users did notstate any items the application showed an error pageWhile setting up a profile required some effort on theside of the user it was a necessary step for the experi-ment Since we did not have access to the log files of alive application profile generation had to be simulatedThe process was facilitated by a specifically tailoredinterface (see Fig 2) where subjects could search foritems through an autocomplete field that applied AJAXtechnology9

It enabled users to type in keywords (eg the nameof a music act or the author of a book) for whichmatchings were instantaneously displayed withoutreloading the site Hence participants could quicklyfind and select their items of interest Whenever auser entered a query through the AJAX interfacethe system matched the query keywords against anApache SolrLucene search index10 The index wasbuilt with item metadata which had been extractedfrom DBpedia prior to setting up the website Theindex comprised the names and abstracts of itemsWhenever available information on thumbnail pic-tures were saved in the index to be ready for display inthe AJAX interface In case the Apache SolrLucene

9httpapijquerycomjqueryajax10httpluceneapacheorgsolr

14 L Wenige et al Similarity-based Graph Queries

Fig 2 User profile generation TC1 (music domain)

search engine found index keywords that matched theuser query metadata information for these items wereimmediately shown on the screen This informationwas displayed to help participants make an informeddecision on whether the AJAX result list containedsuitable items for profile generationAside from the AJAX feature the appeal of the in-terface was increased by a progress bar at the top ofthe webpage (see Fig 2) Progress bars are commonlyutilized tools for web surveys In conventional paper-based studies subjects usually see how far ahead theyare while in a web survey they cannot check theirprogress immediately This is especially true for stud-ies that apply a screen-by-screen navigation whichwas the chosen approach for the conducted web ex-periments Progress bars prevent users from tiring ofthe questionnaire and withdrawing from the study alto-gether by showing how close participants are to com-plete the experiment [14]Upon profile generation the SKOSRec engine calcu-lated on-the-fly recommendations for a simple query(similar to Q1 Sect 3) which were shown to usersin a separate screen Suggestions were displayed witha sufficient amount of information eg thumbnailslabels or a short description to help participants as-sess the quality of the recommendation In addition tothis information users could also access the respectiveWikipedia article by following the hyperlink on the la-bel Relevance sliders were used to assess the utility ofeach recommendation They are often applied in eval-uations of IR systems [51 54] Scores were handled aspoints on a 0 to 100 scale of the relevance slider (outerleft side lowest relevance outer right side highest rel-evance see Fig 3)

This approach facilitated the calculation of an accu-rate precision score and the application of tests withhigher statistical power than an ordinal 5-star scaleThe score called mean relevance (mrs) was measuredwith Equation 19

mrs =

ksumi=1

scorei

k(19)

In addition to the precision score the web applica-tion also determined the size of the recommendationlist produced by the engine as an indicator for recallHowever recommendation list size can only be uti-lized for this purpose when mrs scores reach a reason-able levelWhile accurate recommendations are useful becausethey build trust in a system [57] they only capture anarrow aspect of performance [36] For instance anonline retailer can tremendously profit from an RSthat provides both relevant and unobvious recommen-dations since it enables users to explore the prod-uct catalog more thoroughly On the other hand inthe case of an RS only suggesting popular productsit is likely that the user already knows them [3 23]A pure accuracy-based evaluation disregards these as-pects which is why other performance indicators suchas novelty were measured as well Thus the evalua-tion interface also contained a section where subjectshad to tick a radio button that indicated whether anitem was new In the backend of the web applicationnovelty was measured as the ratio of relevant items not

L Wenige et al Similarity-based Graph Queries 15

Fig 3 Recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

familiar to the user (|nov|) among the total numberof relevant items (|tp|) (Eq 20)

nv =|nov||tp|

(20)

An item was considered relevant when it achievedan mrs score of 50 or higher In addition to novelty anddiversity the web application also tracked the diversityof recommendation lists since a topically diversifiedresult set has been found to increase overall user satis-faction [31 66] In the RS literature the most widelyexplored diversification metrics are the ones that mea-

sure the similarity between each item pair in the rec-ommendation list The more the items are alike theless diverse is the result set [19] The study series ofthis paper applied the metric by Castells et al [8] Itdetermines item-to-item similarity values with the co-sine similarity measure Scores are then converted todistance values (Eq 21) and averaged for the recom-mendation list (Eq 22)

d(il im) = 1minus cos(il im) (21)

16 L Wenige et al Similarity-based Graph Queries

divC =2

k(k minus 1)

sumlltm

d(il im) (22)

The content-based diversity scores (divC) were cal-culated in the backend of the web application andsaved in a log file for each generated recommenda-tion list of the SKOSRec engine during runtime execu-tion In addition to the measurement of divC scores weasked participants about their opinions on recommen-dation list diversity directly Following the RS evalua-tion framework by Pu et al [44] two Likert items wereposed for this purpose

ndash The recommendations of list [] are diversendash The recommendations of list [] are similar to

each other

Besides the above listed algorithmic performancemetrics the authors gathered assessments from usersregarding the overall usefulness of the suggestionsThis approach is also in line with the empirically veri-fied evaluation framework by Pu et al [44] which pro-poses to measure the psychometric construct of Per-ceived Usefulness to get a comprehensive understand-ing of user opinions on recommendation results Thusparticipants were instructed to state their agreement topositive statements regarding the relevance of the sug-gestions The statements were adapated to the speci-ficities of each usage scenario Fig 4 shows an exam-ple collection of Likert questions for the domain ofmusic RS

After participants had provided the required infor-mation their answers were saved in a log file and theywere navigated to test case 2 (TC2) which evaluatedthe SKOSRec enginersquos performance for constraint-based queries (similar to Q3 Sect 3) We tried to findout if the engine was able to improve user satisfactionthrough filter conditions Even though IR systems al-ready successfully apply facet filters on result sets theexperiments still had to prove whether LOD-enabledconstraints have the same effect on the Perceived Use-fulness of recommendations For the comparisons ofnon-filtered and filtered suggestions we applied awithin-subjects design in which participants had torate two recommendation lists resulting from regular(non-filtered) and filtered requests in TC2 We favoredthis setup over a between-subjects design where sub-jects would have been assigned to only one result listWhile the between-subjects design more closely re-sembles a real-world setting since it does not require

participants to interact with more than one approachat a time differences among subjects (eg regard-ing demographics or topic expertise) can potentiallyhave a considerable impact on quality judgments [19]Hence when applying a between-subjects design onedoes not know whether differences in performance arecaused by the approaches or by the variability amongparticipants Therefore researchers need to conductbetween-subjects experiments with a sufficient numberof study subjects to level out these differences Within-subjects studies on the other hand require fewer par-ticipants while still achieving the same level of sta-tistical power since between-subject variability is notan issue [30 31] Beel et al have shown that evensmall variations in an RS test setting could cause sub-stantial differences in performance outcomes [6] Tokeep the extent of variations even lower comparisonsbetween non-filter and constraint-based recommenda-tions were based on the same profile for each user inTC2 Therefore the user profile from TC1 served as astarting point for filtered retrieval The web applicationshowed the recently generated profiles to users andasked them to provide an additional filter conditionFig 5 depicts an example web form for a constraint-based recommendation request in the music domain Itcomprises the user profile and an additional text fieldwhere users stated their constraint

Afterwards the engine applied this filter conditionon the set of potential recommendation results be-fore similarity calculation started For each usage sce-nario different types of constraints were available Re-garding suitable filter options the author sought toachieve a balance between assisting users in finding fil-ter conditions that would adequately represent their in-formation needs while keeping the variability amongusers as low as possible Therefore the specificitiesof each scenario and the availability of the respectivedata sources were determined For instance for themultimedia domains (movie music and books) it wasassumed that users would like to filter recommenda-tions according to the specific genre of the item Fortu-nately the DBpedia dataset contains this kind of infor-mation for the music domain In this domain the genrecan be retrieved through the properties dbogenreHowever in the movie and book domains DBpediadoes not provide sufficient information Many LOD re-sources which represent books were not assigned agenre In the movie domain a genre property is miss-ing entirely and genre-specific information can of-ten only be found in the subject categories describ-ing the movie Hence in contrast to the experiment on

L Wenige et al Similarity-based Graph Queries 17

Fig 4 Evaluation screen for recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

music recommendations participants in the domainsof books and films did not filter their recommenda-tion lists by genre On the other hand the propertydcsubject was applied as recommendation filter ineach multimedia domain Since the subject propertyoften links to many informative features of multimediaitems (eg release period geographic information orcontent information) the author decided that this fea-ture represents a powerful filter dimensionIn the experiment on travel destination search the webinterface offered two filters a subject and a location-specific filter The subject filter enabled users to statetheir preferences regarding the characteristics of a lo-cation (eg being a nature reserve) The location-specific filter on the other hand gave participants theoption to specify where the desired destination shouldbe located11

The second fundamental research issue of TC2 con-cerned the question whether expressive constraints canboost recommendation quality even further Hence

11httpdbpediaorg

apart from executing a constraint-based workflow witha simple filter (ie a direct attribute of an item)the SKOSRec engine generated suggestions with thehelp of an expressive filter as well This second pro-cedure while still applying the same user constraintas the simple filter utilized an expressive graph pat-tern to include additional LOD resources in the resultset For instance in case a subject filter was selectedresult set expansion was facilitated by exploring thewider semantic space of the DBpedia category graphthrough a property path declaration in the graph pat-tern (eg skosbroader[2]) In the travel exper-iment an additional expressive filter retrieved place-related information (dulhasLocation) for location-specific subproperty relations that are not material-ized in DBpedia Hence the expressive graph pat-tern expanded the filter such that additional proper-ties were matched (eg dbocountry dbocity ordbostate) while still applying the specified userconstraint on the set of LOD resources Upon exe-cution of constraint-based recommendation requestsusers received two recommendation lists (resultingfrom simple and expressive filtering accordingly) in a

18 L Wenige et al Similarity-based Graph Queries

Fig 5 Web form for a contraint-based recommendation request (music domain)

separate evaluation screen Participants evaluated re-sult sets in the same manner as in TC1 For each resultset the screen contained an additional Likert item ask-ing subjects to give their agreement to the statementldquoThe filter has improved the recommendation resultsof listrdquo The sequence of suggestions resulting fromthe different methods was randomly assigned in eachuser session to avoid order effectsAfter study subjects had completed TC2 they werenavigated to test case 3 (TC3) In the TC3 section ofthe travel experiment participants could choose be-tween three options for rollup query patterns (simi-lar to Q6 Sect 3) Users were able to obtain recom-mendations for travel destinations based on POIs theyhad visited during the stay in another destination Theycould either select a city (option 1) a region (option 2)or a country (option 3) as destination type Upon en-tity type selection the SKOSRec engine generated ap-propriate suggestions Fig 6 shows the web form fromthe travel experiment that facilitated the formulation ofadvanced rollup requests in TC3

When users had selected an entity type stated theirfavorite travel destination and three belonging POIs

the application generated a SKOSRec query from theseparameters A simple on-the-fly request was sent tothe engine as well to retrieve baseline recommenda-tions with which the suggestions resulting from theadvanced request were compared at a later stage ofthe experiment The simple query only contained theuserrsquos favorite travel destination and the selected en-tity type option thus omitting information on the POIsDuring the travel experiment the engine processedboth the simple and the advanced query and producedtwo recommendation lists As in the previous parts ofthe experiment participants assessed the quality of theresult sets which appeared in random order on thescreenThe multimedia experiments also contained a sec-tion on rollup retrieval (TC3) As the advanced travelqueries the multimedia requests aggregated similarityscores of sublevel entities through summation whichthe engine joined with a postfilter In addition to thesefeatures the pattern contained a preference query partThis section identified a set of preferred items througha single user statement Participants either specifiedtheir favorite directoractor (music domain) music act

L Wenige et al Similarity-based Graph Queries 19

Fig 6 Travel - User profile generation TC3

(music domain) or author (book domain) and receivedrecommendations based on the features of the workscreated by the artist (similar to Q5 Sect 3) TheSKOSRec engine generated recommendations basedon the creative works of the favored artist Partici-pants also received suggestions from a simple on-the-fly query which computed results according to theartistrsquos characteristics As in the travel experiment rec-ommendations from the advanced rollup request werecompared with the on-the-fly query without letting par-ticipants know which approach they were assessingand through random assignment of result set positionsAfter participants of the multimedia experiments hadevaluated the results of TC3 they were guided to testcase 4 (TC4) which evaluated cross-domain queriesThe experiment on travel RS ended with TC3 becauseno suitable graph patterns could be identified to fa-cilitate these kinds of suggestions for the travel usagescenario In the multimedia domain however the datafrom DBpedia was sufficient for this retrieval task Theentity type of the study (ie movie music act or book)defined the source domain based on which the enginegenerated suggestions for items from another multime-dia target domain (similar to Q7 Sect 3) Fig 7 de-picts the cross-domain web form of the music experi-ment

The web form assisted subjects in formulating a re-quest which then obtained movie suggestions basedon the favorite music act of the user Parameters fromusers were entered in the placeholders of the querypattern in the backend of the application We formu-lated the query templates in such a way that the queryengine matched any properties or graph patterns thatcould potentially connect two items from the specified

target and source domains For instance such match-ings can occur when a user states his favorite moviethat is written by a particular author In case the sameauthor has also written some books they might be ofinterest to the user as well Another example stemsfrom the movie domain Suppose in hisher profile aconsumer has declared a preference for a movie thatlinks to the corresponding soundtrack It might wellbe the case that the user likes the soundtrack and themusic acts that were involved in creating it just asmuch as she likes the movie The practice to distin-guish homonymous LOD resources through the prop-erty dbowikiPageDisambiguates is another in-teresting linking pattern from DBpedia that was ex-ploited for cross-domain requests in TC4 since it con-nects similar entities (ie the book edition of a certainmovie)Cross-domain SPARQL queries (similar to Q8 Sect3) were posed as baseline requests in this test caseWhile the SPARQL query only matched cross-domainrelations for a single LOD resource the SKOSRec re-quest expanded the search space by considering linksfrom more than one item As in the previous test casesTC4 ended with an evaluation screen where users as-sessed the SPARQL-based as well as the SKOSRecsuggestions in random order without knowing whichrecommendation list belonged to which method Table10 gives an overview of the test cases and the meth-ods that were applied in the experiments in each usagescenario

42 Results

In total 257 people took part in the studyThe gath-ered samples from the experiments contained answers

20 L Wenige et al Similarity-based Graph Queries

Fig 7 Music - User profile generation TC4 (page 9)

Table 10Test Cases in the web experiments

Test Case Evaluation Purpose Query Type Travel Movie Music BookTC1 On-the-fly Recommendations (Baseline) Q1 X X X X

TC2 Constraint-based Recommendations Q3 X X X X

TC3 Advanced Queries (Rollup) Q5 Q6 X X X X

TC4 Advanced Queries (Cross-Domain) Q7 x X X X

from both genders with a slight overrepresentation ofmale participants (537) The prevailing number ofstudy subjects was between 20 and 40 years of age(658) The evaluation of the demographics sectionalso revealed that the majority of participants alreadyperceives the search in their domain of interest as ei-ther ldquomanageablerdquo (385) or even as ldquoeasyrdquo or ldquoveryeasyrdquo (440)Prior to analyzing the performance results of the novelretrieval approaches of the SKOSRec engine it waschecked whether the respective Likert items for Per-ceived Usefulness and user-based diversity (divU)reached acceptable reliability levels For this purposeCronbachrsquos α values were determined Table 11 showsthe outcome of this analysis for the Perceived Useful-ness construct Given the acceptable values through-out the domains (Cronbachsrsquo s α gt 07) it could betreated as an interval-scaled variable in the statisticaltests

Table 11Cronbachrsquos α values for the concept of Perceived Usefulness

Domain Cronbach αTravel 07002

Movie 08579

Music 08515

Book 08949

Besides usefulness user-based diversity scores werealso measured by multiple Likert items However

Cronbachrsquos α values for this concept did not reach ac-ceptable levels In order to avoid that valuable user as-sessments remained unused agreements to the state-ment bdquoThe recommendations of list [] are diverse ldquowere taken into account as an ordinal variable (divU)Based on these results subsequent tests could be car-ried out The first part of the analysis concerned theperformance of the on-the-fly retrieval method (base-line) of the SKOSRec engine in TC1 The outcomeof this analysis confirms that the on-the-fly approachrepresents a viable recommendation strategy Table 12shows the mean (M)12 standard deviation (S D) andnumber of participants (N) for each performance di-mension and metric It can be seen that users assessedthe enginersquos performance in almost each quality di-mension to be above mediocrity Aside from the posi-tive user evaluations with regard to Perceived Uselful-ness related quality aspects such as Accuracy Noveltyand Diversity received good assessments as well Forinstance each domain mean relevance score (mrs) washigher than 50 score points It means that participantswere often convinced of the relevance of the resultsGiven these positive user assessments with regard toitem relevance it seems to be justified to consider themean of result list sizes as an indicator of recall Theengine was able to generate the required number ofrecommendations in almost each test case (see Table

12In case of the user-based diversity score (divU) M denotes themedian because of the ordinal scale of the variable

L Wenige et al Similarity-based Graph Queries 21

12) The on-the-fly retrieval approach was also testedwith regard to the Novelty dimension Table 12 showsthe mean novelty scores (nv) for the baseline recom-mendation approach Novelty scores indicate that themajority of items in a result list were deemed to be newin the travel and book RS whereas in the experimentson movie and music RS only one third of the recom-mendation list was unfamiliar to users In contrast tothat content-based Diversity scores (divC) were ratherhigh in each domain Hence it can be concluded thatitems in a result set usually have a low rate of match-ing SKOS annotations In addition to content-basedscores user assessments of recommendation list Di-versity (divU) were also taken into account The me-dian level of agreement to the Diversity-related Lik-ert item stating that recommendations are diverse liedat 3 (ldquoneutralrdquo) in the multimedia domains and at 4(ldquoagreerdquo) in the travel domainIn summary the baseline approach achieved fairlygood performance results in TC1 The findings demon-strate that LOD-enabled recommendations can po-tentially enhance a userrsquos search experience Theyalso allow for subsequent tests of advanced retrievaltechniques which can be compared to this standardmethod

In the analysis of performance results from TC2(constraint-based recommendations) particular atten-tion was paid to participantsrsquo agreements with thestatement ldquoThe filter has improved the recommenda-tion results of the listrdquo Fig 8 shows the distribution ofdomain-wise agreement ratios On average they werefairly high Throughout the domains the vast majorityof participants either selected 5 (ldquostrongly agreerdquo) 4(ldquoagreerdquo) or 3 (ldquoneutralrdquo) Hence a filter often had apositive impact on results

The authors gathered responses to the questionon improvement for recommendation lists resultingfrom both the simple and the expressive filtering ap-proaches Hence they give clues about the generalperformance of constraint-based retrieval Howeversince this section of the study revealed the experimen-tal condition to participants (ie application of a fil-ter) answers to the Likert statement have to be inter-preted cautiously Results may be slightly biased asparticipants were able to guess the underlying agendaHowever these limitations do not exist for compar-isons between simple and expressive filters Since theweb application randomly assigned the list order par-ticipants did not know which approach they were as-sessing This enables us to take a closer look at the dif-ferences between the two filtering approaches It was

Table 12Performance results in TC1 (M = Mean S D = Standard DeviationN = No of participants)

Dimension (Metric)[Max Score]

Domain Results

M SD N

Accuracy (size) [10]

Travel 1000 000 103Movie 1000 000 50Music 1000 000 53Book 980 140 51

Accuracy (mrs) [1]

Travel 5649 000 103Movie 6225 1915 50Music 5570 010 53Book 5747 1555 50

Novelty (nv) [1]

Travel 060 034 101Movie 036 031 50Music 036 039 52Book 061 029 47

Diversity (divU) [5]

Travel 4 - 102Movie 3 - 50Music 3 - 53Book 3 - 50

Diversity (divC) [1]

Travel 080 016 103Movie 087 012 50Music 086 010 50Book 092 008 50

Usefulness [5]

Travel 334 085 103Movie 355 073 50Music 318 085 53Book 343 081 50

hypothesized that advanced filters improve recommen-dation quality as they may identify more relevant re-sources thereby overcoming problems of data qualityor data sparsity in LOD repositories Table 13 lists re-sponse rates of the two methods The rate measuresthe ratio of non-zero result sets among all result setsThroughout the domains recommendation lists result-ing from expressive filtering achieved higher responserates

Table 13Response Rates (TC2)

Domain Constr-based(Regular)

Constr-based(Expanded)

N

Travel 23 57 103

Movie 86 88 50

Music 72 75 53

Book 73 76 51

22 L Wenige et al Similarity-based Graph Queries

000

025

050

075

100

Digital Library Travel Movie Music Book

Domain

Per

cent

age

of A

gree

men

t

Degree of Agreement

1

2

3

4

5

The filter has improved the recommendations of the list

Fig 8 Participantsrsquo agreement with the statement that the filter has improved the result list (domain-wise) (TC2)

In addition to response rates user assessments andscores in the defined quality dimensions were com-pared Table 14 depicts the outcome of this analysisSignificantly higher scores are marked in bold figuresIn case statistical tests identified no meaningful varia-tions the results of the better performing approach areunderlined A-levels were adjusted with the false dis-covery rate (FDR) in order to decrease the probabilityof making a type I error due to multiple testing

According to mean scores expressive requestsgenerated more comprehensive recommendation lists(size) This finding is in line with the increased re-sponse rates shown in Table 13 However subse-quently conducted t-tests confirmed significant dif-ferences only for the travel experiment (t(102) =minus789 p lt 0001) In contrast precision scores werehigher in most of the domains (travel movie mu-sic) when the SKOSRec engine processed a simpleconstraint-based query However statistical tests didnot confirm any systematic differences The authorsapplied Wilcoxon tests for mrs scores and the remain-ing performance metrics because of the small samplesizes resulting from the low response rates of the sim-ple constraint-based approachRegarding Novelty none of the approaches was supe-rior In the Diversity dimension expressive filters gen-erated suggestions that were at least as topically di-versified as the results from simple constraint-basedretrieval Even though no significant differences were

identified between the two approaches increased meanscores (divU in the travel domain and divC in each do-main) indicate a slight superiority of expanded filterrequests in this quality dimension The same appliesto usefulness scores (mean values were higher in 3 outof 4 domains when expressive filtering was applied)However these statements are speculative at best be-cause statistical tests were not significantIn summary it can be concluded that expressive fil-ters improve recall potentially leading to a slight lossin precision scores (see Table 14) It may also be thecase that expanded user constraints diversify as wellas increase the usefulness of recommendation listsHowever these claims are unverified In contrast itis safe to say that expressive query patterns generateresults of at least the same quality as simple patternswhile increasing the number of relevant suggestionsThe few differences may be explained by the high Jac-card scores (JI) (see Table 15) of recommendation listsThroughout the domains result sets were much alikeGiven these similarities and the increases in list sizesfor expressive filters it is supposed that an expandedconstraint produces an enhanced version of the recom-mendation list resulting from simple filtering There-fore expanded filters should be the default setting in aconstraint-based retrieval context

In TC3 (ie evaluation of roll-up query patterns)the SKOSRec engine almost always generated non-empty recommendation lists (see Table 16) In the

L Wenige et al Similarity-based Graph Queries 23

Table 14Performance results in TC2 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Simple Expressive N

M SD M SD

Accuracy (size) [10]

Travel 117 270 452 462 103Movie 672 408 720 392 50Music 596 464 636 451 53Book 459 450 500 451 51

Accuracy (mrs) [1]

Travel 6673 2051 6534 2236 23Movie 6095 2298 6029 2279 43Music 6377 1605 6296 1644 34Book 5617 1818 5681 1681 37

Novelty (nv) [1]

Travel 048 041 061 049 21Movie 044 036 055 080 42Music 055 039 052 037 38Book 077 034 076 033 33

Diversity (divU) [5]

Travel 3 - 35 - 24Movie 4 - 4 - 43Music 4 - 4 - 38Book 3 - 3 - 37

Diversity (divC) [1]

Travel 068 018 072 021 19Movie 070 024 073 023 41Music 086 013 087 012 34Book 084 024 095 063 29

Usefulness [5]

Travel 333 093 350 077 24Movie 332 085 331 086 43Music 355 081 364 078 38Book 339 083 362 121 37

Table 15Jaccard Indices (TC2)

Domain Mean Jaccard Index (JI) SD N

Travel 057 047 25

Movie 071 038 43

Music 082 035 37

Book 092 026 35

travel and the book experiment the response rate washigher for aggregation-based queries whereas in themovie domain the engine more often provided recom-mendations for regular requests However these dif-ferences are only marginal and do not indicate a clearsuperiority of one method

Fig 9 depicts participants general agreement to Lik-ert statements concerning the Perceived Usefulness ofthe approaches in TC3 The diagram shows similarresults for regular as well as advanced requests Onaverage satisfaction scores reached levels which layslightly above neutral agreement A subsequent t-test

Table 16Response Rates (TC3)

Domain Regular Aggr-based N

Travel 99 100 103

Movie 96 92 50

Music 100 100 53

Book 96 100 51

confirmed no significant differences between the twomethods In fact the approach with the highest aver-age score of Perceived Usefulness was different in eachexperiment While mean values were higher for regu-lar requests in the studies on travel and music RS ad-vanced queries produced higher scores in the movieand book experiments

Content-based Diversity (divC) was the only met-ric with systematic differences Here scores went upfor advanced retrieval patterns in the travel (t(98) =minus550 p lt 0001) the music (t(52) = minus451 p lt

0001) and the book domain (t(47) = minus490 p lt

24 L Wenige et al Similarity-based Graph Queries

0

1

2

3

Travel Movie Music Book

Domain

Agr

eem

ent

Method

Regular

Rollminusup

Perceived Usefulness (Regular vs Rollminusup Queries)

Fig 9 Participantrsquos overall satisfaction with regular and roll-up recommendation retrieval (TC3)

0001) (see Table 17) This finding is not surprisinggiven that the aggregation-based approach explores thewider semantic network of each item in the profile

Hence it can be assumed that both approaches pro-duce recommendation lists of similar quality and canprovide benefits to potential consumers Thus it can-not be clearly stated which approach is superior be-cause neither of them outperformed the other in eachdomain While the regular retrieval strategy was bet-ter in terms of usefulness in the travel and the mu-sic domains the aggregation-based approach achievedhigher scores in the domains of movie and book rec-ommendationsThis conclusion is especially impressive given thelow mean Jaccard scores for recommendation lists re-sulting from regular and aggregation-based retrieval(see Table 18) The mean score never exceeded 01

throughout the domains It indicates a low concor-dance between result sets It is remarkable that par-ticipants perceived the recommendations as equallygood given the high dissimilarity of the lists Thusadvanced queries have an added-value as they provideadditional interesting results They can help to exploreLOD repositories in different ways than a regular re-trieval approach Therefore it is worthwhile to offeraggregation-based query patterns as an alternative re-trieval strategy when the regular approach has not pro-duced any helpful recommendations

In addition to TC3 user assessments for TC4 (cross-domain recommendations) were also evaluated How-ever since the authors conducted experiments for TC4only in the multimedia domains the samples were con-siderably smaller Nevertheless it is assumed that theamount of completed user sessions is only just enoughto conduct further statistical analyses Table 19 showsthe response rates for the two retrieval methods Cross-domain patterns had a higher success rate of generat-ing non-zero result sets throughout the domains Par-ticipants almost always received a recommendation forthis query type whereas in case a regular SPARQLquery was executed often they did not receive a singlesuggestion at all

We applied non-parametric tests for the perfor-mance metrics mrs nv divU divC and PerceivedUsefulness due to the low response rates of theSPARQL-based approach which led to a low num-ber of comparable data points accordingly Overall theFDR-adjusted pairwise Wilcoxon tests detected no sig-nificant differences except for user-based Diversityscores (divU) in the movie (Z = minus269 p lt 005) andthe book domain (Z = minus256 p lt 005) The resultsfor the content-based Diversity (divC) metric seem topoint in the same direction because of the high meanvalues for cross-domain queries in each domain Butthe significance of these results was not verified Themrs scores indicate a better performance of regular

L Wenige et al Similarity-based Graph Queries 25

Table 17Performance results in TC3 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular Aggr-based N

M SD M SD

Accuracy (size) [10]

Travel 990 099 942 207 103Movie 288 059 276 082 50Music 1000 000 1000 000 53Book 288 059 300 000 51

Accuracy (mrs) [1]

Travel 4821 2498 4493 2559 102Movie 5458 2288 5669 2542 44Music 5395 1831 5824 1619 52Book 5153 2390 4925 2230 49

Novelty (nv) [1]

Travel 043 039 038 040 98Movie 035 038 028 041 41Music 049 036 046 039 50Book 061 042 062 044 47

Diversity (divU) [5]

Travel 35 - 3 - 100Movie 4 - 3 - 43Music 4 - 3 - 50Book 3 - 3 - 48

Diversity (divC) [1]

Travel 079 017 089 013 99Movie 075 020 066 027 44Music 083 017 093 007 53Book 074 027 093 006 48

Usefulness [5]

Travel 336 077 322 074 101Movie 320 092 326 094 44Music 345 084 344 085 51Book 295 087 303 096 48

Table 18Jaccard Indices (TC3)

Domain Mean Jaccard Index (JI) SD N

Travel 007 011 101

Movie 003 007 44

Music 002 005 53

Book 002 006 50

Table 19Response rates (TC4)

Domain Regular (SPARQL) SKOSRec N

Movie 32 96 50

Music 62 98 53

Book 53 100 51

SPARQL queries but the within-subjects test did notconfirm this assumption statistically (see Table 20)

However the most important finding of the anal-ysis was that SKOSRec cross-domain queries gener-ated significantly more suggestions throughout the do-

mains (movie t(49) = minus1792 p lt 0001 mu-sic t(52) = minus2573 p lt 0001 book t(50) =minus1275 p lt 0001) (see Table 20) Fig 10 illustratesthis graphically

Whenever the SPARQL request provided recom-mendations which was the case in less than half ofthe user sessions (see Table 19) the SKOSRec sug-gestions were assessed to be of almost equal qualityas the SPARQL recommendations (see Table 20 Use-fulness) Hence when SPARQL querying did not pro-duce any results in more than the other half of the testcases the SKOSRec approach may still have provideduseful suggestions

Therefore it is reasonable to assume that the capa-bility of the SKOSRec engine to flexibly switch be-tween processing steps of similarity calculation andpattern matching can facilitate an improved explo-ration of LOD repositories Table 21 also shows thatthe items in the two sets often did not match as isindicated by low Jaccard indices This finding furtherdemonstrates that the execution of an additional step

26 L Wenige et al Similarity-based Graph Queries

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

00

25

50

75

100

Movie Music Book

Domain

No

of i

tem

s

Method

Regular

CrossminusDomain

Recommendation List Size

Fig 10 Mean recommendation list sizes for SPARQL and SKOSRec cross-domain requests (TC4)

L Wenige et al Similarity-based Graph Queries 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

of similarity calculation can help to retrieve other rel-evant items

5 Future Work amp Conclusion

This paper has presented the query-based SKOS-Rec engine that facilitates new types of recommen-dation requests for LOD repositories The evaluationshave shown that the SKOSRec system can often gener-ate relevant and useful suggestions when certain querytemplates are utilized In many cases the applicationof expressive or advanced query patterns (ie in thecontext of constraint-based retrieval or cross-domainrequests) helped to significantly improve recall valueswhile still providing the same level of quality in theremaining performance dimensions Additionally ad-vanced recommendation requests can be used to gener-ate diversified recommendation lists in case users arenot satisfied with the results of regular recommenda-tion requests (ie roll-up or cross-domain retrieval vsregular requests) While not being superior in each do-main and test case is at least safe to say that the novelrecommendation approaches of the SKOSRec engineconsiderably extend common retrieval methods and atleast provide an alternative search strategy when con-ventional methods fail to provide useful results Addi-tionally the increase in diversity and recall is in linewith findings from previous research on LOD-enabledRS [45]It is a strength of the developed approaches that theyfacilitate combinations of graph-based and similarity-based retrieval at different stages of the recommenda-tion workflow This feature extends general search ca-pabilities for semantic networks It requires future re-search to investigate whether in addition to SKOS fur-ther RDF vocabularies can be used for ad-hoc item-to-item similarity computation to increase the range ofpotential usage scenarios since not every LOD datasetcontains SKOS annotations [53] The application ofadditional properties could also boost the recommen-dation quality of some query patterns (eg such as thequery types used in TC3) In this context it also needsto be determined whether the representation of the user

profile in the SKOSRec query language can be substi-tuted with free-text expressions to enable search-likefunctionalities LOD researchers have already devel-oped engines that can process natural language queriesover RDF data [59] [34] [67] It will have to be inves-tigated how the SKOSRec engine profitably fits intothis landscape of existing retrieval toolsAnother open research question concerns the aspect ofinterfaces to other applications In its standard config-uration the system is not an end-user retrieval enginebut a backend application with which an administra-tor who has domain knowledge (eg of RDF vocab-ularies and data models of a particular LOD reposi-tory and usage scenario) creates the respective querypatterns Although the websites of the online experi-ments represent possible implementations of suitableend-user applications for the SKOSRec engine thereare still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selec-tion of LOD items for the profile was made possibleby previously storing the data in an index that couldthen be accessed and searched before issuing a recom-mendation query However this runs against the ideaof ad-hoc retrieval In this context future research willhave to determine whether it is feasible to generate anon-the-fly mapping from items to the respective LODresources In the case of a commercial application theengine needs to match items with the correspondingentries in the product catalog Even if it is not yet clearwhich UI components can make the best possible useof the SKOSRec enginersquos features the results from theuser experiments can give at least a few suggestionsThe following retrieval options should be the defaultsetting in an interface that contains a LOD-enabledRS component to assist users in finding interestingitems Since the regular approach of on-the-fly re-trieval achieved good results throughout the domainsthe display of simple recommendations generated frompreviously stated user preferences represents a suitablestarting point for retrieval On-the-fly queries could berefined by providing filter options Since the evalua-tions have shown that expressive constraints often leadto increased recall values the application of such afilter might be the best option for assisted retrievalAs in the preparations for the experiments appro-priate graph-based query patterns for the usage sce-nario in question would also have to be determinedby a domain expert before setting up the applicationThe same applies to advanced retrieval patterns whichshould be available to the end user to refine the querywhen both the simple as well as the constraint-based

28 L Wenige et al Similarity-based Graph Queries

approach have failed to provide relevant items Thissuggestion is made because roll-up queries were com-petitive with regular recommendations in the web-based experiments Therefore they represent a viablealternative retrieval strategy For some domains (egfor multimedia retrieval) it is also possible to usean optional selection field to enable the formulationof cross-domain requests Evaluations for this querytype suggest that users might receive interesting results(ie diversified and comprehensive recommendationlists) from such a requestIn summary the contributions of the paper are as fol-lows

ndash Development and implementation of novel re-trieval strategies in the SKOSRec system whichfacilitate personalized requests and leverage thepotential of knowledge graphs in order to addressthe research gap in state-of-the-art recommenda-tion methods with regard to customized queries

ndash Realization of web-based user experiments infour usage scenarios with the following out-comes

bull Successful demonstration that knowledge graphdata is applicable in the context of live recom-mendation scenarios

bull Creation of suitable query patterns that canbe applied to represent individual informationneeds in different application contexts The setof query patterns can be extended with the helpof the SKOSRec query language to meet therequirements of additional usage scenarios

bull Proof that the novel retrieval strategies can besuccessfully applied across various applicationcontexts and recommendation tasks and thatthey outperform conventional content-basedretrieval approaches especially in the perfor-mance dimensions of diversity and recall

The prototypical implementation and the evaluationof the SKOSRec engine in different usage scenarioshave demonstrated that the freely available knowledgesources on the LOD cloud can be successfully appliedto advance state-of-the-art methods in IR and recom-mender systems The proposed personalized and cus-tomized retrieval strategies can leverage the potentialof knowledge graphs by better matching informationneeds of users with the help of semantic search op-tions

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be com-prised of up to six elements of which the partsSimProjection and ItemPart are obligatory Henceusers need to state at least their preferencesand how many recommendations they wouldlike to receive Advanced retrieval patterns canbe formulated as well For instance similar re-source retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (Se-lectPart) of LOD resources As in SPARQL thequery can start with a prologue that abbreviatesIRIs with namespace declarations [21]

SelectPart The SelectPart part enables subqueryingwith recommendation results The surroundingquery is a SELECT query with a slightly simpli-fied WHERE clause (ie RecWhereClause) withwhich postfilter conditions can be formulatedIn this section users have the option to formu-late aggregation-based retrieval requests (Aggre-gation)

Aggregation This query part handles aggregation-based postfiltering Users specify the IRI (IRIref )resource for which similarity scores of sublevelentities are summarized Additionally it is statedhow the data should be aggregated (ie SUMMAX or AVG) as well as which variable (Var)represents the aggregation-based recommenda-tions in the postfilter section The compiler makessure that this variable is actually contained in theRecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refersto the list of generated suggestions It definesthe recommendation variable to which all simi-lar LOD resources are mapped In case pre- andpostfilter conditions are set in the other sectionsthe compiler checks whether the variable (Var)appears in these sections as well Otherwise therequired join operations cannot be executed Itis specified how many recommendations shouldbe displayed (Integer) and whether the requestwill be issued against a default repository or as across-repository request (ServiceIntegration)

ServiceIntegration This section indicates that a userintends to receive recommendations from a dif-ferent SPARQL endpoint than the default end-point that is stated in the standard configurationof the SKOSRec engine It specifies the targetSPARQL endpoint (IRIref ) from which a user

L Wenige et al Similarity-based Graph Queries 29

wants to receive recommendations Upon extrac-tion of SKOS annotations from the default sourceendpoint a subsequent request is sent to the spec-ified target endpoint

ItemPart This section of a SKOSRec query repre-sents a single preference in the user profile How-ever recommendation queries can contain a cou-ple of preference statements A preference is ei-ther expressed as a LOD resource (IRIref ) or asa variable which is bound to a preference query(VarPart) In the latter case the resources areobtained by matching the graph pattern in theVarPart with the triple statements in the reposi-tory The ItemPart statement can be additionallysupplemented with a concept-to-concept similar-ity threshold (Sim) that triggers concept expan-sion with the approach of flexible similarity de-tection which is extensively described in one ofour previous works [64]

VarPart The construct initiates preference queryingIndividual likings are expressed as a graph pat-tern in a RecWhereClause The variable (Var) isa placeholder for the LOD resources that will beused to set up the user profile The compiler hasto make sure that the specified variable occurs inthe RecWhereClause as well Otherwise the ex-traction of LOD resources cannot be carried outcorrectly

Sim The Sim part of a SKOSRec query denotes thethreshold of concept-to-concept scores for flexi-ble similarity detection Even though knowledge-based similarity values usually range between 0and 1 the specification allows a broader range ofdigits in the Decimal datatype to enable applica-tion of other similarity metrics in case they areneeded

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users canstate whether scores should be ldquolargerrdquo ldquolargerthanrdquo or ldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three dif-ferent sections of a SKOSRec query ie either inthe prefilter (ItemPart) the postfilter (SelectPart)or the preference filter section (VarPart) TheRecWhereClause closely resembles the ldquoWhere-Clauserdquo of the SPARQL 11 specification withall its subsequent parts [21] Hence it enablesdifferent combinations of graph pattern match-ing However the RecWhereClause of the SKOS-Rec language allows fewer pattern matching ex-pressions than the ldquoWhereClauserdquo of the regular

SPARQL syntax specification [21] For instanceit is neither permitted to formulate subqueries norto direct filter requests to more than one reposi-tory (see RecGraphPatternNotTriples for furtherexplanations) This modification has been madeto simplify similarity calculation After parsingthis section the compiler makes sure that recom-mendation preference or postfilter variables oc-cur at least once in the RecWhereClause to guar-antee that subsequent steps of the recommenda-tion workflow can be applied on existing LOD re-sources In case the RecWhereClause comprises aRecMinusGraphPattern it also has to be checkedthat the variable is not only contained in theMINUS part of the group

RecGroupGraphPattern A RecGroupGraphPatterncan contain one to many basic graph patternswhich can be differently combined according toRecGroupGraphPatternSub

RecGroupGraphPatternSub This section defines thegraph patterns that are applied to identify suit-able LOD resources User filters can be either ex-pressed as basic graph patterns (TriplesBlock) oras combinations of them (RecGraphPatternNot-Triples) The sequence of different kinds of pat-terns can be flexibly specified

RecGraphPatternNotTriples This query part closelyresembles the similar named ldquoGraphPatternNot-Triplesrdquo of the SPARQL 11 specification [21]The only difference to SPARQL is that the Rec-GraphPatternNotTriples section is a little morerestricted in terms of admissible graph patternsthan the ldquoGraphPatternNotTriplesrdquo section of theregular SPARQL specification For instance itdoes not allow to retrieve LOD resources otherthan from the default graph that is specified in theconfiguration of the SKOSRec engine since filterpatterns have to be applied to datasets that containSKOS annotations Hence they need to be speci-fied before runtime execution to ensure that sim-ilarity calculation can be carried out correctly Ifa user was able to formulate filter conditions thatreferred to different RDF graphs and endpointsaccordingly this would not be possible

RecGroupOrUnionGraphPattern This query sec-tion marks an alternative graph pattern in the styleof the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntaxspecification [21] However it neither allowsSPARQL-like subqueries nor federated queries

30 L Wenige et al Similarity-based Graph Queries

for filtered resources because this would compli-cate similarity calculation

RecOptionalGraphPattern In this part of a SKOS-RecQuery users are enabled to specify additionalgraph patterns which might extend the query so-lution but do not necessarily have to match thedata

RecMinusGraphPattern This section handles exclu-sion of LOD resources for cases when users wantto omit certain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

L Wenige et al Similarity-based Graph Queries 31

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM A

mobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige J Ruhland Flexible on-the-fly recommendationsfrom Linked Open Data repositories In International Confer-ence on Business Information Systems pp 43ndash54 2016

[63] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[64] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[65] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[66] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[67] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Future Work amp Conclusion
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 3: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge … · Semantic Web 0 (0) 1 1 IOS Press Similarity-based Knowledge Graph Queries for Recommendation Retrieval Lisa Wenige*,

L Wenige et al Similarity-based Graph Queries 3

2 Related Work

Effective answering of retrieval requests has beeninvestigated by researchers for many years Even be-fore the evolution of the LOD cloud scientists de-veloped strategies for extracting relevant informationfrom text data The most prevalent IR systems performrelevance rankings for unstructured resources and of-fer means to state filter conditions that complementqueries in a faceted search interface The approachescan combine query constraints with similarity-basedretrieval In these systems user requests are quicklyanswered due to extensive preprocessing and index-ing of resources before runtime execution [58] How-ever low latency of search results comes at the cost ofa hard-wired data model which prevents flexible cus-tomizations and causes limitations in terms of ad-hocretrieval [18]Other researchers have developed query-based RS fortable data For instance Adomavicius et al [2] proposethe REQUEST query language that generates sugges-tions from a relational database which contains infor-mation on both user profiles and items Thus personal-ized recommendation queries can be formulated An-other interesting approach in the category of query-based RS is the FlexRecs system by Koutrika et al[32] The key advantage of their course RS is that itenables highly individual requests in which differentquery parts (ie profile generation filtering of rec-ommendations or like-minded peers) can be flexiblycombined into so called recommendation workflowsHence highly individual recommendation queries canbe formulated with FlexRecsDespite the strengths and weaknesses of the discussedapproaches it has to be taken into account that theyare not designed to consume LOD LOD-enabled sys-tems on the other hand use RDF data for retrievalThere exist many index-based Linked Data IR systems[4 9 10 20 25 40 46 47 61] However they indexRDF resources before runtime retrieval and can there-fore neither facilitate on-the-fly similarity calculationnor do they provide extensive SPARQL-based queryoptionsAnother interesting engine category are on-the-flyLinked Data IR systems These systems process in-formation from the LOD cloud through spreading ac-tivation algorithms While they enable fast retrievalthe downside of these approaches is that graph-basedquery constraints are only possible to a limited ex-tent due to the applied probabilistic path traversal tech-niques [16 17 28 33 35 56]

Existing LDRS on the other hand mostly performoffline computation of item-to-item similarities fromLOD metadata descriptions [13 22 24 29 50 65]While this approach offers more control over the re-trieval process it prevents individual customizationsand runtime responsesTo this date there are only a few query-based LDRS[5 43 49] Most of these systems rely on the assump-tion that user preferences and item metadata reside inthe same repository But this goes against the notion ofthe LOD cloud as a decentralized and openly accessi-ble data space which can be queried through API-likeinterfaces Additionally while query-based LDRS pro-vide SPARQL-based query options to filter recommen-dation results they are not capable to flexibly combinesimilar resource retrieval with graph filters (eg sub-querying with recommendation results)In summary existing non-Linked Data as well asLinked Data-enabled retrieval and RS engines alreadyprovide useful features for personalized resource ac-cess Nevertheless the full potential of LOD for rec-ommendation and search tasks has yet to be realized(see Table 1) The novel query strategies of the SKOS-Rec engine are attempts to tackle the previously out-lined research gaps to improve existing RS as well asto offer new search strategies for LOD repositories thatare guided by personal preferences

3 The SKOS Recommender

31 System Overview

The SKOSRecommender is designed to consumerecommendation requests and was developed with thehelp of the Apache Jena framework1 A SKOSRecquery triggers a recommendation workflow in whichsimilarity-based retrieval steps can be joined withSPARQL-like filter expressions Listing 1 shows thesyntax of the SKOSRec query language that needs tobe applied to formulate recommendation requests

In the given grammar underlined parts representSPARQL syntax elements that have been directly takenfrom the latest W3C specification [21] Words in cap-ital letters denote query keywords which are alsoSPARQL expressions except for the ones listed in thelanguage parts SimProjection and ServiceIntegrationAs a regular SPARQL SELECT query a SKOSRecquery always generates a table-like result set that con-

1httpsjenaapacheorg

4 L Wenige et al Similarity-based Graph Queries

Table 1Summary of existing retrieval and recommendation approaches

System Type LOD-enabled A SPARQL-based queries B On-the-fly similarity C Flexible combination ofA and B

IR systems x x x x

Query-based RS x x X (X)

Index-based LinkedData IR systems

X x x x

On-the-fly LinkedData IR systems

X x X x

LDRS X x x x

Query-based LDRS X X x x

Listing 1 Grammar of the SKOSRec query languageSKOSRecQuery = Prologue S e l e c t P a r t S i m P r o j e c t i o n PREF I t e m P a r t + RecWhereClause S e l e c t P a r t = SELECT ( DISTINCT | REDUCED) Var+ RecWhereClause

LimitClause A g g r e g a t i o n A g g r e g a t i o n = AGG IRIref Var (SUM | MAX | AVG )S i m P r o j e c t i o n = RECOMMEND Var TOP INTEGER S e r v i c e I n t e g r a t i o n S e r v i c e I n t e g r a t i o n = FROM SERVICE IRIREFI t e m P a r t = ( V a r P a r t | IRI ) SimV a r P a r t = [ Var RecWhereClause ]Sim = SIM R e l a t i o n DECIMALR e l a t i o n = ( gt | gt= | = )RecWhereClause = WHERE RecGroupGraphPa t t e rnRecGroupGraphPa t t e rn = rsquo rsquo RecGroupGraphPa t t e rnSub rsquo rsquoRecGroupGraphPa t t e rnSub = TriplesBlock ( R e c G r a p h P a t t e r n N o t T r i p l e s rsquo rsquo TriplesBlock)lowastR e c G r a p h P a t t e r n N o t T r i p l e s = RecGroupOrUnionGraphPat te rn | R e c O p t i o n a l G r a p h P a t t e r n |

RecMinusGraphPa t t e rn | Filter | Bind | InlineDataRecGroupOrUnionGraphPa t te rn = RecGroupGraphPa t t e rn ( rsquoUNIONrsquo RecGroupGraphPa t t e rn )lowastR e c O p t i o n a l G r a p h P a t t e r n = rsquoOPTIONALrsquo RecGroupGraphPa t t e rnRecMinusGraphPa t t e rn = rsquoMINUSrsquo RecGroupGraphPa t t e rn

tains at least a mapping to a single variable A detailedexplanation of each part of the SKOSRec grammar canbe found in the Appendix A of this paperFig 1 depicts the overall architecture of the systemwith all its components The most important parts arethe SKOSRec query processing units (ie the parserand the compiler) They check the syntax of recom-mendation requests and regulate the correct execu-tion of critical operations of the engine (eg graph-based filtering result set joins similarity calculation)The architecture does not dictate a particular SPARQLserver implementation for LOD repositories since thesystem is decoupled from the endpoint that is accessedover HTTP during retrieval However the prototypi-cal implementation of the SKOSRec engine utilizes theOpenLink Virtuoso server2

Fig 1 also shows the sequential order of a typicalrecommendation workflow for a complex request thatconsists of the following retrieval steps

2httpsvirtuosoopenlinkswcom

1 Issuing a SKOSRec query2 Parsing the SKOSRec query3 Compiling the SKOSRec query4 Loading information on the LOD repositoryS-

PARQL endpoint to be queried from the config-uration

5 Retrieval of preferred items and setting up of theuser profile (see Subsect 33 Preference Query-ing)

6 Retrieval and processing of relevant LOD re-sources upon application of prefilter conditions(see Subsect 34 Prefiltering)

7 Optimizing the retrieval process through identi-fying the resources that can be omitted for thefinal ranking3

8 Determination of similar LOD resources basedon the user profile (see Subsect 32 On-the-FlyRecommendations

3 For further information on optimized retrieval please refer to[62]

L Wenige et al Similarity-based Graph Queries 5

Fig 1 Architecture and workflow of the SKOSRec engine

9 Ranking of LOD resources10 Sending of recommendation results to the com-

piler and joining them with postfilter constraints11 Retrieval of the final result set (see Subsect 35

Postfiltering amp Combinations)12 Output to the user

32 On-the-Fly Recommendations

A simple on-the-fly request is the most basic form ofa SKOSRec query It is based on the enginersquos ability togenerate ad-hoc recommendations from a user profilecontaining preference statements for LOD resourcesListing 2 depicts an example SKOSRec on-the-fly re-quest that obtains suggestions based on a userrsquos pref-erence for the western movie ldquoThey Call Me Trin-ityrdquo In the given query the preferred movie is rep-resented by the DBpedia resource (dbrThey_Call_Me_Trinity) The shown query also contains a prefixwith a namespace declaration and a specification re-garding the number of suggestions the engine shouldgenerate

Listing 2 A simple recommendation query for themovie domain (Q1)

PREFIX dbr lthttpdbpediaorgresourcegt

RECOMMEND movie TOP 5PREF dbrThey_Call_Me_Trinity

In this basic form the query does neither con-tain any pre- or postfilter conditions nor a preferencequery statement Thus the engine simply identifies allmovies that are similar to the preference in the profileWhen executed over DBpedia the query returns thesolution set that is depicted in Table 2

Table 2Result set for Q1

moviedbrGod_Forgivesrsquo_I_DonrsquotdbrAce_High_(1968_film)dbrBoot_Hill_(film)dbrTrinity_Is_Still_My_NamedbrTroublemakers_(1994_film)

For determining recommendations in an ad-hocfashion the system identifies SKOS annotations bymatching a suitable property (ie annotation property)in an RDF dataset (Definition 1)

6 L Wenige et al Similarity-based Graph Queries

Definition 1 (Annotation property) An annotationproperty is an IRI (Listing 3) that is defined in theDublin Core Metadata Initiative (DCMI) specificationfor subject annotations It can occur in conjunctionwith one of the two namespaces which are stated bythe standard [15]

Listing 3 Annotation property

ltANNOTPROPgt = dcsubject | dctsubject

These properties help to retrieve SKOS annotationtriples for items preferred by a user (Definition 2)

Definition 2 (SKOS annotation triple) A SKOS anno-tation triple (st) is a triple that is comprised of an IRIin the subject position an annotation property (AN-NOTPROP) as predicate and a concept c in the objectposition The concept is part of a knowledge organiza-tion system S in SKOS format (C = c | c isin S ) (Eq1)

st isin I times ltANNOTPROPgttimesC (1)

The identification of SKOS annotation triples facil-itates a fast retrieval of potential similar items There-fore the engine evaluates shared features between re-sources from the SKOS annotation dataset (Definition3)

Definition 3 (SKOS annotation dataset) A SKOS an-notation dataset (AD) is a subset of an RDF datasetthat contains SKOS annotation triples (Eq 2)

AD sub I times ltANNOTPROPgttimes C (2)

Before the similarity calculation process starts theengine determines SKOS annotations for each LODresource (r) from the user profile by issuing a sub-sequent SPARQL query to the LOD repository fromwhich the user intends to receive suggestions The no-tion of SKOS annotations is specified in Definition 4

Definition 4 (SKOS annotations) In the annotationdataset LOD resources directly link to concepts ofa SKOS system via a predefined annotation propertySKOS annotations for an input resource r are definedas in Equation 3

Annot(r) = c isin S | exist(rltANNOTPROPgt c) isin AD (3)

Afterwards the engine identifies which of the otherresources in the dataset shares annotations with itSimilarities only have to be computed for items withmatching concepts [13 52] Therefore triples canbe efficiently joined along item features since nativeRDF storage systems usually index triples rather thancolumns [41] Thus the quick identification of mu-tual annotations through SPARQL requests facilitatesad-hoc similarity calculation Definition 5 specifies thenotion of relevant resources and their annotations thatare needed to identify similar items The applied syn-tax follows Peacuterez et al [42]

Definition 5 (Relevant resources and their annota-tions) The conjunctive graph pattern Pr matches allitems and subjects that are potentially relevant for rec-ommendation retrieval (Eq 4)

Pr = (rltANNOTPROPgt c) AND (qltANNOTPROPgt c)

(4)

The mapping Ωr of relevant resources and theirSKOS annotations is obtained by retrieving all re-sources q that share at least one SKOS concept c withresource r from AD (Eq 5) The corresponding resultset contains mappings for all variables (ie c and q)in the triple statements of Pr

Ωr = [[Pr]]AD (5)

Upon extraction of relevant resources and annota-tions the process of similarity calculation can startThe engine determines the information content (IC) ofthe shared SKOS annotations of two resources Thisapproach is based on the hybrid similarity metric pro-posed by Meymandpour and Davis [37] for LOD-enabled RS However while their measure considersall types of IRI resources for the retrieval our similar-ity metric purely relies on SKOS annotations (Defini-tion 6)

Definition 6 (SKOS similarity) Let Annot(r) be theset of SKOS features of resource r and Annot(q) the setof SKOS features of resource q and q isin micro(q) | micro isinΩr then their similarity can be derived from the IC oftheir shared concepts Cshared = Annot(r) cap Annot(q)

sim(r q) = IC(Cshared) (6)

L Wenige et al Similarity-based Graph Queries 7

The IC of a set of SKOS concepts is measured bythe sum of the inverse logarithms of each conceptrsquos fre-quency in relation to the maximum frequency of oc-currences among relevant resources (Definition 7)

Definition 7 (SKOS Information Content) The SKOSIC is defined as an aggregation of the individual ICvalues of each concept that is contained in the setof shared features (c isin Cshared) where f req(c) isthe frequency of occurrences of c among all rele-vant resources and n is the maximum frequency of oc-currences among these resources The final similarityscore of two resources r and q is determined by sum-mating the IC values of each concept of the shared fea-ture set

IC(Cshared) = minussum

cisinCshared

log(

f req(c)

n

)(7)

After having obtained resource annotations as wellas IC values similarity scores between the input re-source r and each potentially relevant resource q canbe calculated When a user profile contains more thana single item similarity values are aggregated throughsummation to determine the final recommendationscore (Definition 8)

Definition 8 (Recommendation score) The recom-mendation score of a potentially relevant resource q isquantified by the sum of similarities with each resourcer that can be found in the profile (Pr)

score(Pr q) =sumrisinPr

sim(r q) (8)

Upon calculation of similarity values for each po-tentially relevant item q the engine ranks the retrievedresources based on a given limit (k) that is specified bythe user in the SKOSRec queryWith the presented methods of on-the-fly retrieval rec-ommendations can be generated from LOD reposito-ries through an API interface without requiring lo-cal metadata or excessive preprocessing operationsother than the mapping of profile items with LOD re-sources4

4Di Noia et al describe how a mapping procedure can beconducted for a real-world implementation For their LDRS theymatched movie titles with the corresponding LOD resources in DB-pedia The authors extracted IRIs labels and publication dates forall movie resources They applied the Levenshtein distance on theseresources to identify matching movies items [13]

33 Preference Querying

In the previous subsection it was assumed that usersexpress their tastes in the form of preference state-ments for a certain item in the form of an IRI refer-ring to a LOD resource However it may also be thecase that likings are only vaguely known for instancewhen users prefer products with specific features butare not able or willing to specify the actual itemsThe engine obtains such a profile by issuing a sub-sequent SPARQL query that contains the preferencegraph pattern (Ppre f ) in the WHERE condition Sup-pose a user is interested in Quentin Tarantino movies(dbrQuentin_Tarantino) but does not specify theprecise items she likes He could send a preferencequery to the SKOSRec engine The graph pattern inthe request would match all movies that are con-nected to dbrQuentin_Tarantino with the prop-erties dbodirector or dboproducer (see Listing7)

Listing 4 SKOSRec query to generate recommenda-tions based on a preference query in the movie domain(Q2)

PREFIX dbr lthttpdbpediaorgresourcegt PREFIX dbo lthttpdbpediaorgontologygt

RECOMMEND movie TOP 5PREF [ prefMovieWHERE prefMovie dbodirector dbrQuentin_Tarantino UNIONprefMovie dboproducer dbrQuentin_Tarantino ]

Table 35 shows the corresponding solution set tothe preference query containing films from DBpediathat are similar to Quentin Tarantino movies

Table 3Result set for Q2

moviedbrThe_Godfather_Part_IIdbrThe_Prestige_(film)dbrThe_Dark_KnightdbrPlanet_TerrordbrClerks

The notion of a queried profile is specified in Defi-nition 9

Definition 9 (Queried profile) A queried profile is auser profile that has been compiled by retrieving allLOD resources r that are part of an annotation graph

8 L Wenige et al Similarity-based Graph Queries

AD and are contained in the solution mappings result-ing from the evaluation of a specified graph pattern(Ppre f ) over a dataset D (Eq 9)

Pr = micro(r) | micro isin [[Ppre f ]]D r isin dom(micro) r isin AD

(9)

34 Prefiltering

To facilitate exploratory search in LOD reposito-ries users should also have the option to formulateconditions in their queries A possible way to enablefilters during retrieval is before similarity calculationThus recommendation lists can be adjusted to individ-ual needsDue to the expressiveness of RDF data filter condi-tions can not only be applied on attributes that are di-rectly connected to potentially relevant items but canalso be declared as a graph pattern By this means it ispossible to retrieve LOD resources that are both simi-lar to the user profile as well as fulfill an advanced con-ditionThe strengths of the approach will be illustrated with atravel recommendation request that was executed overDBpedia Suppose a customer expressed a preferencefor the Lake Baikal region (eg as indicated by a pos-itive rating on a travel community site) With simpleon-the-fly retrieval she would receive a recommen-dation list that primarily consists of travel destinationsthat are located in the proximity to this geographicalarea A graph-based filter on the other hand facili-tates advanced retrieval options In the given exampleit enables the user to obtain suggestions for SoutheastAsian travel destinations that have similar features asthe Lake Baikal region even though the resource dbrLake_Baikal is not directly connected to any desti-nation with the annotation dbcSoutheast_Asia butcan only be identified by exploring the surroundinggraph structure of the filter attributeThe advanced constraint ensures that the generatedresult set is not empty A simple attribute-level fil-ter which is commonly utilized in faceted search sys-tems would not generate any search hits By thesemeans graph-based filter patterns can help users tobetter explore knowledge graphs thereby overcomingdata quality issues in LOD repositories The corre-sponding SKOSRec query to this example is shown inListing 5

Listing 5 SKOSRec query with an expressive pre-filter condition to retrieve Southeast Asian destinationsbased on a preference for the Lake Baikal region (Q3)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbc lthttpdbpediaorgresourceCategorygtPREFIX dbo lthttpdbpediaorgontologygtPREFIX dct lthttppurlorgdctermsgtPREFIX skos lthttpwwww3org200402skoscoregt

RECOMMEND destination TOP 5PREF dbrLake_Baikal[ WHERE destination dbocountry country

country dctsubject countrySubjectcountrySubject skosbroader dbcSoutheast_Asiadestination rdftype dboPlace

]

Table 34 shows the solution to this query It con-tains travel destinations and regions of interest thatare both similar to the previously liked destinationdbrLake_Baikal and are located in Southeast AsiaEcoregions around lakes rivers or mountains fromSoutheast Asian countries such as Cambodia Vietnamor Malaysia are presented to the user

Table 4Result set of Q3

destinationdbrTonle_SapdbrMekongdbrMount_KinabaludbrLaguna_de_BaydbrLake_Toba

The technical details for this type of query are givenin the following definitions Before the extraction ofrelevant resources and annotations the recommenda-tion engine identifies the resources that satisfy the fil-ter (Definition 10)

Definition 10 (Prefiltered resources) The mapping offiltered resources Ωpre f ilter is obtained by matching aspecified graph pattern (P) to the RDF graph of adataset D (Eq 10)

Ωpre f ilter = micro|q | micro isin [[P]]D (10)

After preselection resources are brought togetherwith user profile data to retrieve the final set of recom-mendations (Definition 11)

Definition 11 (Annotations and relevant resources(prefiltered retrieval)) The solution mappings of an-notations and relevant resources in prefiltered retrievalmode is obtained by joining the set of prefiltered re-

L Wenige et al Similarity-based Graph Queries 9

sources with the mappings Ωr of all relevant resourcesand annotations as determined from the user profile

Ωpre = Ωpre f ilter Ωr (11)

After retrieving annotations and relevant resourcesin prefiltering mode the system generates constraint-based recommendations In this context the rankingprocedure is adapted to the restricted set of relevantresources and annotations Hence IC values of match-ing annotations and respective SKOS similarities arequantified according to the user filter By this meansrecommendation scores reflect the similarity of itemsfor the actual set of filtered LOD resources Except forthis restriction the system carries out the calculations(ie determination of similarity values the ranking ofLOD resources) in the same way as in non-filteringmode

35 Postfiltering amp Combinations

The SKOSRec engine cannot only combine simi-larity calculation and graph pattern matching to pre-filter relevant LOD resources it can also apply userconstraints after similar items have already been de-termined By this means recommendations are filteredex-post to the similarity detection process This fea-ture enables novel retrieval patterns in which sugges-tions are part of a subquery As an example scenariofor a postfilter recommendation task consider the fol-lowing SKOSRec query which obtains suggestions formovies and directors based on a userrsquos preference forthe director dbrQuentin_Tarantino (see Listing6) For the given request the engine would identifysimilar directors based on Quentin Tarantinorsquos char-acteristic features (eg geographic origin or awardswon) and would determine the movies these similar di-rectors have shot Thus users may receive interestingmovie recommendations

Table 5 shows the solution the SKOSRec system re-turns when Q4 is issued against DBpedia The ellipsesin the displayed tables indicate records that have beenexcluded for brevity reasons

However this approach can still be improved uponFor instance the engine only considers the metadatadescriptions of the director whereas related movieshave to be retrieved anew upon execution of the post-filter section Thus they are not ranked according tosimilarity in the final output tableAnother weakness is the biased semantics of the query

Listing 6 SKOSRec query to generate simple post-filtered recommendations in the movie domain (Q4)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX yago lthttpdbpediaorgclassyagogt

SELECT director movieWHERE movie dbodirector director director rdftype yagoDirector110014939 LIMIT 50RECOMMEND director TOP 5PREF dbrQuentin_Tarantino

Table 5Result set of Q4

director moviedbrKirk_Douglas dbrScalawag_(film)dbrKirk_Douglas dbrPosse_(1975_film)dbrKevin_Costner dbrThe_Postman_(film)dbrKevin_Costner dbrDances_with_Wolves dbrGene_Kelly dbrHello_Dolly_(film)dbrGene_Kelly dbrInvitation_to_the_Dance_(film) dbrAlan_Alda dbrGoodbye_Farewell_and_AmendbrAlan_Alda dbrMargaretrsquo_s_Engagement dbrSteve_Buscemi dbrInterview_(2007_film)dbrSteve_Buscemi dbrAnimal_Factory

Usually when a person favors a movie director thispreference refers to the movies this director has shotand not to the personal features of the person Two di-rectors might be similar according to certain featuressuch as birth year nationality or received awards butthe style and genre of the movies they have created canstill be different A third factor is that recommenda-tion retrieval for movie directors should also take intoaccount that a film artist is likely more relevant themore movies she has shot that are in line with the pre-ferred movies of the user Hence the SKOSRec engineenables another postfiltering strategy which is calledaggregation-based retrieval This strategy can be help-ful whenever an entity type is linked to a couple ofLOD resources in a dataset such that similarity scoresof related entities (eg movies) can be aggregated toan overall recommendation score for a certain item(eg for a movie director)Aggregation-based requests can be well combinedwith other retrieval patterns such as preference queries

10 L Wenige et al Similarity-based Graph Queries

In the case of the movie recommendation scenario auser could state that she enjoyed Quentin Tarantinomovies in the past and would like to receive sugges-tions for directors that shot similar film Usually thiskind of question can typically only be answered byanother human being eg in a personal interactionamong friends or in an online forum The fact that thelanguage facilitates these kinds of queries is one of itskey advantages

Listing 7 SKOSRec query to generate post-filteredrecommendations based on a preference query in themovie domain (Q5)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX yago lthttpdbpediaorgclassyagogt

SELECT director movieWHERE movie dbodirector director director rdftype yagoDirector110014939 LIMIT 50AGG dbrQuentin_Tarantino director SUMRECOMMEND movie TOP 100PREF [ prefMovieWHERE prefMovie dbodirector dbrQuentin_Tarantino ]

Table 6 depicts the results of Q5 as retrieved fromDBpedia When comparing the solutions sets for Q4and Q5 it becomes clear that they are totally dif-ferent While Q5 generated a list of directors whohave mostly created action and thriller movies with atwist (ie Tarantinorsquos unique filmmaking style5) Q4provided a completely different output It shows USAmerican actor-directors whose movies range frommusical films (dbrHello_Dolly_(film)) towestern movies (dbrDances_with_Wolves) ordrama films (dbrInterview_(2007_film))Thus the directors in the solution table of Q4 (Table5) seem rather arbitrary and not exceptionally helpfulconcerning the initial user request

We define this kind of aggregation-based request asa so-called rollup query This definition was chosen be-cause the final suggestions are based on summarized(ie rolled up) preferences for sublevel entities An-other example for a roll-up request stems from the ap-

5httpsenwikipediaorgwikiQuentin_Tarantino

Table 6Result set of Q5

director moviedbrSteven_Spielberg dbrIndiana_JonesdbrSteven_Spielberg dbrJurrasic_Park_(film) dbrFrancis_Ford_Coppola dbrThe_Godfather_Part_IIdbrFrancis_Ford_Coppola dbrThe_Godfather_Part_III dbrChristopher_Nolan dbrThe_Prestige_(film)dbrChristopher_Nolan dbrThe_Dark_Knight dbrRobert_Rodriguez dbrPlanet_TerrordbrRobert_Rodriguez dbrMachete_(film) dbrKevin_Smith dbrClerksdbrKevin_Smith dbrClerks_II

plication scenario of travel search This recommenda-tion query derives suggestions for city trip destinationsfrom the points of interest (POI) a user has visited andliked in another city (eg London) Listing 8 showsthe corresponding SKOSRec query for this scenario

Listing 8 SKOSRec query to generate post-filteredrecommendations (aggregation-based) for a preferenceprofile containing the city of London (Q6)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX dullthttpwwwontologydesignpatternsorggtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX rdfslthttpwwww3org200001rdf-schemagt

SELECT cityWHERE poi property city city rdftype dboCity property rdfssubPropertyOf dulhasLocation LIMIT 3AGG dbrLondon city SUMRECOMMEND poi TOP 100PREF dbrOxford_StreetdbrNotting_HilldbrSouth_Bank

The SKOSRec request contains subquery com-mands that refer to the userrsquos preferred POIs It spec-ifies how similarity scores of the generated sugges-tions should be aggregated (ie sum-based) The re-trieval is based on the 100 POIs that are most simi-lar to the ones in the preference section Additionallythe subquery part refers to the DBpedia resource thatrepresents London (AGG dbrLondon city) thusindicating that the list of recommendations should notcontain any POIs in this city Upon processing the de-

L Wenige et al Similarity-based Graph Queries 11

picted query over the DBpedia repository the enginedisplays the three cities for whom the highest aggre-gated scores have been detected (see Listing 7)

Table 7Result set of Q6

citydbrOxforddbrAbingdon-on-ThamesdbrWallingford_Oxford

The motivation for formulating such a query is thatwhen searching for cities preferences might be betterrepresented through the sights a user has visited else-where rather than through a simple on-the-fly requestsolely containing a preference statement for a city inthe user profile sectionBesides the above presented roll-up retrieval patternsthe SKOSRec grammar may facilitate the formula-tion of additional types of advanced recommendationqueries For instance cross-domain queries are an in-teresting additional pattern to be explored for personal-ized retrieval Listing 9 shows an example request thatgenerates movie suggestions based on the preferencefor a music band (ie dbrThe_Jackson_5)

Listing 9 SKOSRec query to generate cross-domainrecommendations (Q7)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgt

SELECT movie participated musicalActWHERE movie rdftype dboFilm VALUES dbobasedOn dbomusicComposerdbogenre dbostarring musicalAct participated movie UNION movie participated musicalAct LIMIT 10AGG dbrThe_Jackson_5 movie MAXRECOMMEND musicalAct TOP 100PREF dbrThe_Jackson_5WHERE VALUES actType dboMusicalArtist dboBand musicalAct rdftype actType

The query contains a prefilter condition that trig-gers retrieval of music artists and bands Afterwardsthe scores of the top 100 suggestions are summarizedwith maximum-based aggregation Thus the highestsimilarity value among the set of music acts deter-

Table 8Result set of Q7

movie musicalActdbrMoonwalker dbrMichael_Jackson dbrThe_Wiz_(film) dbrAshford_amp_SimpsondbrGarfield_Gets_a_Life dbrThe_TemptationsdbrPipe_Dreams_(1976_films) dbrGladys_Knight_amp_the_PipsdbrThe_Jacksons dbrJermanine_Jackson

mines the ranking score of the film Also note be-cause of the aggregation-based retrieval procedureconnections between movies and the LOD resourcedbrThe_Jackson_5 are excluded from similaritycalculation The scenario exemplifies how semanticrelationships can help to generate suggestions for a tar-get domain (ie movies) that are based on a profilecontaining items from a different source domain (iemusic acts) While regular SPARQL queries performexact matching of triple statements the SKOSRec syn-tax facilitates the integration of similarity- and graph-based retrieval to enrich result lists with other poten-tially relevant itemsAs a point of reference consider the SPARQL queryfor the given cross-domain scenario (Listing 10)

Listing 10 SPARQL query to generate cross-domainrecommendations (Q8)

PREFIX dbr lthttpdbpediaorgresourcegt PREFIX dbo lthttpdbpediaorgontologygt PREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgt

SELECT movie musicalActWHERE movie rdftype dboFilm VALUES dbobasedOn dbomusicComposerdbogenre dbostarring dbrThe_Jackson_5 participated movie UNION movie participated dbrThe_Jackson_5 LIMIT 5

The query omits the similarity calculation partThus it only retrieves movies linked to the resourcedbrThe_Jackson_5 By this means the SPARQLresults have a better chance of being related to the ini-tial user profile On the other hand empty result listscan occur when the preferred LOD resource does nothave any direct connections to recommendable itemsThis assumption is backed up by the result lists of thetwo queries (Tables 8 and 9) The SKOSRec query pro-duces a more comprehensive solution than a regularSPARQL query

12 L Wenige et al Similarity-based Graph Queries

Table 9Result set of Q8

movie musicalActdbrThe_Jacksons dbrThe_Jackson_5

The following of definitions will formalize the post-filtered and aggregation-based retrieval forms Defini-tion 12 specifies the notion of SKOSRec recommenda-tions that may potentially undergo postprocessing

Definition 12 (SKOSRec recommendations) SKOS-Rec recommendations (R(Pr)) are the set of sugges-tions generated by processing the solution mappings(Ω) obtained from recommendation retrieval Theycan be a result of simple on-the-fly retrieval prefilter-ing or a result of a combination of these techniques Prrepresents the input profile that contains at least onepreference for an item The cardinality of R(Pr) is theintended number of recommendations (k) that is spec-ified by the user based on which the engine selects theresources with the highest recommendation scores thatare not contained in the profile (Eq 12)

R(Pr) = x | x isin q | score(Pr q) gt kmaxqisinΩ

score(Pr q) q isin Pr

(12)

The approach of subquerying with recommendationresults builds on the idea that SKOSRec suggestionsR(Pr) can be joined with any graph pattern Ppost pastto the process of similarity calculation (Definition 13)It does not make a difference how the engine obtainedthese recommendations

Definition 13 (Postfiltered recommendation mapping)The postfiltered recommendation mapping is obtainedby joining solution mappings generated from SKOS-Rec recommendations with the results of matching apostfilter graph pattern (Ωpost) (Eqs 13 and 14)

Ωpost f ilter = micro | micro isin [[Ppost]]D x isin dom(micro)

(13)

Ωpost = micro|x | x isin R(Pr) Ωpost f ilter (14)

As the prefilter Ppost can be comprised of triplestatements that are eligible in the RecGroupGraph-

Pattern part of the SKOSRec grammar Aggregation-based queries have to be treated differently than a reg-ular postfilter request Apart from excluding sublevelitems of the profile from the set of similar resourcesthe engine should also omit any connections betweenthe preferred superordinate item (a) and similar sub-level entities Additionally the user has to specify thevariable in his profile (eg y) that represents the itemswithin the postfilter graph pattern Ppost By this meansaggregation-based recommendations can be generated(Definition 14)

Definition 14 (Resources for aggregation-based re-trieval) The set of resources for aggregation-based re-trieval is obtained by retrieving LOD resources thathave a connection to a previously generated sugges-tion thereby excluding all resources that are linked tothe specified superordinate item a in the profile (Eq15)

Ragg(a) = micro(y) | micro isin Ωpost y isin dom(micro) a isin micro(y) (15)

For each potential aggregation-based suggestion (y)similarity scores of connected entities (x) have to beconsidered for the final ranking In this context differ-ent approaches to aggregation can be applied In caseswhen both similar items and aggregation-based recom-mendations are entities on comparable levels of gran-ularity then choosing the maximum similarity scoremight be the best way to represent the relatedness ofthe resource to the user profile On the other handwhen postfiltered recommendations are based on sim-ilarity scores of their sublevel entities (eg as in themovie recommendation example) the scores of relatedsuggestions should be aggregated by the sum or theaverage of individual scores

Definition 15 (Aggregation-based ranking score) Theranking score of an LOD resource y that is connectedto a set of recommended items is determined by aggre-gating scores of connected entities either by the max-imum (Eq 16) the sum (Eq 17) or the average (Eq18) of individual scores

scoreaggMAX(y) = maxxisinmicro(xy)

score(Pr x) (16)

scoreaggS UM(y) =sum

xisinmicro(xy)

score(Pr x) (17)

L Wenige et al Similarity-based Graph Queries 13

scoreaggAVG(y) =

sumxisinmicro(xy) score(Pr x)

|x isin micro(x y)|(18)

4 Evaluation of the SKOS Recommender

41 Experimental Setup

The novel recommendation strategies of the SKOS-Rec engine were evaluated in the context of web-basedexperiments for the usage scenarios of travel destina-tion search and multimedia RS The online approachhelped to reach out to more participants with diversedemographic backgrounds which would not have beenpossible in a laboratory experiment Another positiveside-effect of the web-based setting was that partici-pants could test and rate recommendations without be-ing directly observed by an experimenter which mighthave caused biased results otherwise Hence the on-line approach softened common limitations of labora-tory studies [19]The experiments were carried out in the defined usagescenarios with metadata descriptions from DBpediaThe database containing the local mirror of DBpediaran on a virtual server Besides the triple store a webapplication was hosted on the same machine The web-site ran on an Apache Tomcat 7 application server andwas implemented with the help of Java servlet technol-ogy67

We conducted the online experiments between April2016 and January 2017 Upon setting up the web ap-plication a pretest was carried out to control for po-tential pitfalls such as misleading navigation Two testusers ran step by step through the web interface andprovided their feedback Because of their insights weslightly modified the interface (eg through rephras-ing user instructions) to increase the chances of suc-cessful completion In case the SKOSRec engine is ap-plied in a live setting the design of the user interfacewill have to be tested with more users Our experimen-tal series focused on algorithmic performance whichwas tested with numerous participantsSubjects were recruited through the clickworkercomplatform8 Participants received a compensation fee

6httptomcatapacheorg7httpwwworaclecomtechnetworkjavajavaeeservletindexhtml8httpswwwclickworkerde

of 100e or 150e for a completed survey 103 par-ticipants took part in the study on travel destinationsearch In the multimedia domains in total 154 sub-jects evaluated the SKOSRec engine For each partof the evaluation it was ensured that participants hadnot just run through the evaluation screen Wheneverthere appeared to be irregularities in the log files (egthe default settings of the screen were not changed)these assessments were excluded from further analy-sis The survey template started with a consent formIt informed participants about the context and the pro-cedure of the study After having filled in the consentform participants were navigated to the first part of theexperiment It collected data on subjectsrsquo demograph-ics and their consumption and search habits After thepreparation phase participants worked on the first testcase (TC1) in each usage scenario In this test case theperformance of the SKOSRec engine was assessed inthe simple execution mode of on-the-fly recommenda-tions (see Subsect 32) It was done to gather data onthe usefulness of suggestions resulting from a simplecontent-based approach (ie baseline method) withwhich more advanced recommendation strategies werecompared at a later stage In cases where users did notstate any items the application showed an error pageWhile setting up a profile required some effort on theside of the user it was a necessary step for the experi-ment Since we did not have access to the log files of alive application profile generation had to be simulatedThe process was facilitated by a specifically tailoredinterface (see Fig 2) where subjects could search foritems through an autocomplete field that applied AJAXtechnology9

It enabled users to type in keywords (eg the nameof a music act or the author of a book) for whichmatchings were instantaneously displayed withoutreloading the site Hence participants could quicklyfind and select their items of interest Whenever auser entered a query through the AJAX interfacethe system matched the query keywords against anApache SolrLucene search index10 The index wasbuilt with item metadata which had been extractedfrom DBpedia prior to setting up the website Theindex comprised the names and abstracts of itemsWhenever available information on thumbnail pic-tures were saved in the index to be ready for display inthe AJAX interface In case the Apache SolrLucene

9httpapijquerycomjqueryajax10httpluceneapacheorgsolr

14 L Wenige et al Similarity-based Graph Queries

Fig 2 User profile generation TC1 (music domain)

search engine found index keywords that matched theuser query metadata information for these items wereimmediately shown on the screen This informationwas displayed to help participants make an informeddecision on whether the AJAX result list containedsuitable items for profile generationAside from the AJAX feature the appeal of the in-terface was increased by a progress bar at the top ofthe webpage (see Fig 2) Progress bars are commonlyutilized tools for web surveys In conventional paper-based studies subjects usually see how far ahead theyare while in a web survey they cannot check theirprogress immediately This is especially true for stud-ies that apply a screen-by-screen navigation whichwas the chosen approach for the conducted web ex-periments Progress bars prevent users from tiring ofthe questionnaire and withdrawing from the study alto-gether by showing how close participants are to com-plete the experiment [14]Upon profile generation the SKOSRec engine calcu-lated on-the-fly recommendations for a simple query(similar to Q1 Sect 3) which were shown to usersin a separate screen Suggestions were displayed witha sufficient amount of information eg thumbnailslabels or a short description to help participants as-sess the quality of the recommendation In addition tothis information users could also access the respectiveWikipedia article by following the hyperlink on the la-bel Relevance sliders were used to assess the utility ofeach recommendation They are often applied in eval-uations of IR systems [51 54] Scores were handled aspoints on a 0 to 100 scale of the relevance slider (outerleft side lowest relevance outer right side highest rel-evance see Fig 3)

This approach facilitated the calculation of an accu-rate precision score and the application of tests withhigher statistical power than an ordinal 5-star scaleThe score called mean relevance (mrs) was measuredwith Equation 19

mrs =

ksumi=1

scorei

k(19)

In addition to the precision score the web applica-tion also determined the size of the recommendationlist produced by the engine as an indicator for recallHowever recommendation list size can only be uti-lized for this purpose when mrs scores reach a reason-able levelWhile accurate recommendations are useful becausethey build trust in a system [57] they only capture anarrow aspect of performance [36] For instance anonline retailer can tremendously profit from an RSthat provides both relevant and unobvious recommen-dations since it enables users to explore the prod-uct catalog more thoroughly On the other hand inthe case of an RS only suggesting popular productsit is likely that the user already knows them [3 23]A pure accuracy-based evaluation disregards these as-pects which is why other performance indicators suchas novelty were measured as well Thus the evalua-tion interface also contained a section where subjectshad to tick a radio button that indicated whether anitem was new In the backend of the web applicationnovelty was measured as the ratio of relevant items not

L Wenige et al Similarity-based Graph Queries 15

Fig 3 Recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

familiar to the user (|nov|) among the total numberof relevant items (|tp|) (Eq 20)

nv =|nov||tp|

(20)

An item was considered relevant when it achievedan mrs score of 50 or higher In addition to novelty anddiversity the web application also tracked the diversityof recommendation lists since a topically diversifiedresult set has been found to increase overall user satis-faction [31 66] In the RS literature the most widelyexplored diversification metrics are the ones that mea-

sure the similarity between each item pair in the rec-ommendation list The more the items are alike theless diverse is the result set [19] The study series ofthis paper applied the metric by Castells et al [8] Itdetermines item-to-item similarity values with the co-sine similarity measure Scores are then converted todistance values (Eq 21) and averaged for the recom-mendation list (Eq 22)

d(il im) = 1minus cos(il im) (21)

16 L Wenige et al Similarity-based Graph Queries

divC =2

k(k minus 1)

sumlltm

d(il im) (22)

The content-based diversity scores (divC) were cal-culated in the backend of the web application andsaved in a log file for each generated recommenda-tion list of the SKOSRec engine during runtime execu-tion In addition to the measurement of divC scores weasked participants about their opinions on recommen-dation list diversity directly Following the RS evalua-tion framework by Pu et al [44] two Likert items wereposed for this purpose

ndash The recommendations of list [] are diversendash The recommendations of list [] are similar to

each other

Besides the above listed algorithmic performancemetrics the authors gathered assessments from usersregarding the overall usefulness of the suggestionsThis approach is also in line with the empirically veri-fied evaluation framework by Pu et al [44] which pro-poses to measure the psychometric construct of Per-ceived Usefulness to get a comprehensive understand-ing of user opinions on recommendation results Thusparticipants were instructed to state their agreement topositive statements regarding the relevance of the sug-gestions The statements were adapated to the speci-ficities of each usage scenario Fig 4 shows an exam-ple collection of Likert questions for the domain ofmusic RS

After participants had provided the required infor-mation their answers were saved in a log file and theywere navigated to test case 2 (TC2) which evaluatedthe SKOSRec enginersquos performance for constraint-based queries (similar to Q3 Sect 3) We tried to findout if the engine was able to improve user satisfactionthrough filter conditions Even though IR systems al-ready successfully apply facet filters on result sets theexperiments still had to prove whether LOD-enabledconstraints have the same effect on the Perceived Use-fulness of recommendations For the comparisons ofnon-filtered and filtered suggestions we applied awithin-subjects design in which participants had torate two recommendation lists resulting from regular(non-filtered) and filtered requests in TC2 We favoredthis setup over a between-subjects design where sub-jects would have been assigned to only one result listWhile the between-subjects design more closely re-sembles a real-world setting since it does not require

participants to interact with more than one approachat a time differences among subjects (eg regard-ing demographics or topic expertise) can potentiallyhave a considerable impact on quality judgments [19]Hence when applying a between-subjects design onedoes not know whether differences in performance arecaused by the approaches or by the variability amongparticipants Therefore researchers need to conductbetween-subjects experiments with a sufficient numberof study subjects to level out these differences Within-subjects studies on the other hand require fewer par-ticipants while still achieving the same level of sta-tistical power since between-subject variability is notan issue [30 31] Beel et al have shown that evensmall variations in an RS test setting could cause sub-stantial differences in performance outcomes [6] Tokeep the extent of variations even lower comparisonsbetween non-filter and constraint-based recommenda-tions were based on the same profile for each user inTC2 Therefore the user profile from TC1 served as astarting point for filtered retrieval The web applicationshowed the recently generated profiles to users andasked them to provide an additional filter conditionFig 5 depicts an example web form for a constraint-based recommendation request in the music domain Itcomprises the user profile and an additional text fieldwhere users stated their constraint

Afterwards the engine applied this filter conditionon the set of potential recommendation results be-fore similarity calculation started For each usage sce-nario different types of constraints were available Re-garding suitable filter options the author sought toachieve a balance between assisting users in finding fil-ter conditions that would adequately represent their in-formation needs while keeping the variability amongusers as low as possible Therefore the specificitiesof each scenario and the availability of the respectivedata sources were determined For instance for themultimedia domains (movie music and books) it wasassumed that users would like to filter recommenda-tions according to the specific genre of the item Fortu-nately the DBpedia dataset contains this kind of infor-mation for the music domain In this domain the genrecan be retrieved through the properties dbogenreHowever in the movie and book domains DBpediadoes not provide sufficient information Many LOD re-sources which represent books were not assigned agenre In the movie domain a genre property is miss-ing entirely and genre-specific information can of-ten only be found in the subject categories describ-ing the movie Hence in contrast to the experiment on

L Wenige et al Similarity-based Graph Queries 17

Fig 4 Evaluation screen for recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

music recommendations participants in the domainsof books and films did not filter their recommenda-tion lists by genre On the other hand the propertydcsubject was applied as recommendation filter ineach multimedia domain Since the subject propertyoften links to many informative features of multimediaitems (eg release period geographic information orcontent information) the author decided that this fea-ture represents a powerful filter dimensionIn the experiment on travel destination search the webinterface offered two filters a subject and a location-specific filter The subject filter enabled users to statetheir preferences regarding the characteristics of a lo-cation (eg being a nature reserve) The location-specific filter on the other hand gave participants theoption to specify where the desired destination shouldbe located11

The second fundamental research issue of TC2 con-cerned the question whether expressive constraints canboost recommendation quality even further Hence

11httpdbpediaorg

apart from executing a constraint-based workflow witha simple filter (ie a direct attribute of an item)the SKOSRec engine generated suggestions with thehelp of an expressive filter as well This second pro-cedure while still applying the same user constraintas the simple filter utilized an expressive graph pat-tern to include additional LOD resources in the resultset For instance in case a subject filter was selectedresult set expansion was facilitated by exploring thewider semantic space of the DBpedia category graphthrough a property path declaration in the graph pat-tern (eg skosbroader[2]) In the travel exper-iment an additional expressive filter retrieved place-related information (dulhasLocation) for location-specific subproperty relations that are not material-ized in DBpedia Hence the expressive graph pat-tern expanded the filter such that additional proper-ties were matched (eg dbocountry dbocity ordbostate) while still applying the specified userconstraint on the set of LOD resources Upon exe-cution of constraint-based recommendation requestsusers received two recommendation lists (resultingfrom simple and expressive filtering accordingly) in a

18 L Wenige et al Similarity-based Graph Queries

Fig 5 Web form for a contraint-based recommendation request (music domain)

separate evaluation screen Participants evaluated re-sult sets in the same manner as in TC1 For each resultset the screen contained an additional Likert item ask-ing subjects to give their agreement to the statementldquoThe filter has improved the recommendation resultsof listrdquo The sequence of suggestions resulting fromthe different methods was randomly assigned in eachuser session to avoid order effectsAfter study subjects had completed TC2 they werenavigated to test case 3 (TC3) In the TC3 section ofthe travel experiment participants could choose be-tween three options for rollup query patterns (simi-lar to Q6 Sect 3) Users were able to obtain recom-mendations for travel destinations based on POIs theyhad visited during the stay in another destination Theycould either select a city (option 1) a region (option 2)or a country (option 3) as destination type Upon en-tity type selection the SKOSRec engine generated ap-propriate suggestions Fig 6 shows the web form fromthe travel experiment that facilitated the formulation ofadvanced rollup requests in TC3

When users had selected an entity type stated theirfavorite travel destination and three belonging POIs

the application generated a SKOSRec query from theseparameters A simple on-the-fly request was sent tothe engine as well to retrieve baseline recommenda-tions with which the suggestions resulting from theadvanced request were compared at a later stage ofthe experiment The simple query only contained theuserrsquos favorite travel destination and the selected en-tity type option thus omitting information on the POIsDuring the travel experiment the engine processedboth the simple and the advanced query and producedtwo recommendation lists As in the previous parts ofthe experiment participants assessed the quality of theresult sets which appeared in random order on thescreenThe multimedia experiments also contained a sec-tion on rollup retrieval (TC3) As the advanced travelqueries the multimedia requests aggregated similarityscores of sublevel entities through summation whichthe engine joined with a postfilter In addition to thesefeatures the pattern contained a preference query partThis section identified a set of preferred items througha single user statement Participants either specifiedtheir favorite directoractor (music domain) music act

L Wenige et al Similarity-based Graph Queries 19

Fig 6 Travel - User profile generation TC3

(music domain) or author (book domain) and receivedrecommendations based on the features of the workscreated by the artist (similar to Q5 Sect 3) TheSKOSRec engine generated recommendations basedon the creative works of the favored artist Partici-pants also received suggestions from a simple on-the-fly query which computed results according to theartistrsquos characteristics As in the travel experiment rec-ommendations from the advanced rollup request werecompared with the on-the-fly query without letting par-ticipants know which approach they were assessingand through random assignment of result set positionsAfter participants of the multimedia experiments hadevaluated the results of TC3 they were guided to testcase 4 (TC4) which evaluated cross-domain queriesThe experiment on travel RS ended with TC3 becauseno suitable graph patterns could be identified to fa-cilitate these kinds of suggestions for the travel usagescenario In the multimedia domain however the datafrom DBpedia was sufficient for this retrieval task Theentity type of the study (ie movie music act or book)defined the source domain based on which the enginegenerated suggestions for items from another multime-dia target domain (similar to Q7 Sect 3) Fig 7 de-picts the cross-domain web form of the music experi-ment

The web form assisted subjects in formulating a re-quest which then obtained movie suggestions basedon the favorite music act of the user Parameters fromusers were entered in the placeholders of the querypattern in the backend of the application We formu-lated the query templates in such a way that the queryengine matched any properties or graph patterns thatcould potentially connect two items from the specified

target and source domains For instance such match-ings can occur when a user states his favorite moviethat is written by a particular author In case the sameauthor has also written some books they might be ofinterest to the user as well Another example stemsfrom the movie domain Suppose in hisher profile aconsumer has declared a preference for a movie thatlinks to the corresponding soundtrack It might wellbe the case that the user likes the soundtrack and themusic acts that were involved in creating it just asmuch as she likes the movie The practice to distin-guish homonymous LOD resources through the prop-erty dbowikiPageDisambiguates is another in-teresting linking pattern from DBpedia that was ex-ploited for cross-domain requests in TC4 since it con-nects similar entities (ie the book edition of a certainmovie)Cross-domain SPARQL queries (similar to Q8 Sect3) were posed as baseline requests in this test caseWhile the SPARQL query only matched cross-domainrelations for a single LOD resource the SKOSRec re-quest expanded the search space by considering linksfrom more than one item As in the previous test casesTC4 ended with an evaluation screen where users as-sessed the SPARQL-based as well as the SKOSRecsuggestions in random order without knowing whichrecommendation list belonged to which method Table10 gives an overview of the test cases and the meth-ods that were applied in the experiments in each usagescenario

42 Results

In total 257 people took part in the studyThe gath-ered samples from the experiments contained answers

20 L Wenige et al Similarity-based Graph Queries

Fig 7 Music - User profile generation TC4 (page 9)

Table 10Test Cases in the web experiments

Test Case Evaluation Purpose Query Type Travel Movie Music BookTC1 On-the-fly Recommendations (Baseline) Q1 X X X X

TC2 Constraint-based Recommendations Q3 X X X X

TC3 Advanced Queries (Rollup) Q5 Q6 X X X X

TC4 Advanced Queries (Cross-Domain) Q7 x X X X

from both genders with a slight overrepresentation ofmale participants (537) The prevailing number ofstudy subjects was between 20 and 40 years of age(658) The evaluation of the demographics sectionalso revealed that the majority of participants alreadyperceives the search in their domain of interest as ei-ther ldquomanageablerdquo (385) or even as ldquoeasyrdquo or ldquoveryeasyrdquo (440)Prior to analyzing the performance results of the novelretrieval approaches of the SKOSRec engine it waschecked whether the respective Likert items for Per-ceived Usefulness and user-based diversity (divU)reached acceptable reliability levels For this purposeCronbachrsquos α values were determined Table 11 showsthe outcome of this analysis for the Perceived Useful-ness construct Given the acceptable values through-out the domains (Cronbachsrsquo s α gt 07) it could betreated as an interval-scaled variable in the statisticaltests

Table 11Cronbachrsquos α values for the concept of Perceived Usefulness

Domain Cronbach αTravel 07002

Movie 08579

Music 08515

Book 08949

Besides usefulness user-based diversity scores werealso measured by multiple Likert items However

Cronbachrsquos α values for this concept did not reach ac-ceptable levels In order to avoid that valuable user as-sessments remained unused agreements to the state-ment bdquoThe recommendations of list [] are diverse ldquowere taken into account as an ordinal variable (divU)Based on these results subsequent tests could be car-ried out The first part of the analysis concerned theperformance of the on-the-fly retrieval method (base-line) of the SKOSRec engine in TC1 The outcomeof this analysis confirms that the on-the-fly approachrepresents a viable recommendation strategy Table 12shows the mean (M)12 standard deviation (S D) andnumber of participants (N) for each performance di-mension and metric It can be seen that users assessedthe enginersquos performance in almost each quality di-mension to be above mediocrity Aside from the posi-tive user evaluations with regard to Perceived Uselful-ness related quality aspects such as Accuracy Noveltyand Diversity received good assessments as well Forinstance each domain mean relevance score (mrs) washigher than 50 score points It means that participantswere often convinced of the relevance of the resultsGiven these positive user assessments with regard toitem relevance it seems to be justified to consider themean of result list sizes as an indicator of recall Theengine was able to generate the required number ofrecommendations in almost each test case (see Table

12In case of the user-based diversity score (divU) M denotes themedian because of the ordinal scale of the variable

L Wenige et al Similarity-based Graph Queries 21

12) The on-the-fly retrieval approach was also testedwith regard to the Novelty dimension Table 12 showsthe mean novelty scores (nv) for the baseline recom-mendation approach Novelty scores indicate that themajority of items in a result list were deemed to be newin the travel and book RS whereas in the experimentson movie and music RS only one third of the recom-mendation list was unfamiliar to users In contrast tothat content-based Diversity scores (divC) were ratherhigh in each domain Hence it can be concluded thatitems in a result set usually have a low rate of match-ing SKOS annotations In addition to content-basedscores user assessments of recommendation list Di-versity (divU) were also taken into account The me-dian level of agreement to the Diversity-related Lik-ert item stating that recommendations are diverse liedat 3 (ldquoneutralrdquo) in the multimedia domains and at 4(ldquoagreerdquo) in the travel domainIn summary the baseline approach achieved fairlygood performance results in TC1 The findings demon-strate that LOD-enabled recommendations can po-tentially enhance a userrsquos search experience Theyalso allow for subsequent tests of advanced retrievaltechniques which can be compared to this standardmethod

In the analysis of performance results from TC2(constraint-based recommendations) particular atten-tion was paid to participantsrsquo agreements with thestatement ldquoThe filter has improved the recommenda-tion results of the listrdquo Fig 8 shows the distribution ofdomain-wise agreement ratios On average they werefairly high Throughout the domains the vast majorityof participants either selected 5 (ldquostrongly agreerdquo) 4(ldquoagreerdquo) or 3 (ldquoneutralrdquo) Hence a filter often had apositive impact on results

The authors gathered responses to the questionon improvement for recommendation lists resultingfrom both the simple and the expressive filtering ap-proaches Hence they give clues about the generalperformance of constraint-based retrieval Howeversince this section of the study revealed the experimen-tal condition to participants (ie application of a fil-ter) answers to the Likert statement have to be inter-preted cautiously Results may be slightly biased asparticipants were able to guess the underlying agendaHowever these limitations do not exist for compar-isons between simple and expressive filters Since theweb application randomly assigned the list order par-ticipants did not know which approach they were as-sessing This enables us to take a closer look at the dif-ferences between the two filtering approaches It was

Table 12Performance results in TC1 (M = Mean S D = Standard DeviationN = No of participants)

Dimension (Metric)[Max Score]

Domain Results

M SD N

Accuracy (size) [10]

Travel 1000 000 103Movie 1000 000 50Music 1000 000 53Book 980 140 51

Accuracy (mrs) [1]

Travel 5649 000 103Movie 6225 1915 50Music 5570 010 53Book 5747 1555 50

Novelty (nv) [1]

Travel 060 034 101Movie 036 031 50Music 036 039 52Book 061 029 47

Diversity (divU) [5]

Travel 4 - 102Movie 3 - 50Music 3 - 53Book 3 - 50

Diversity (divC) [1]

Travel 080 016 103Movie 087 012 50Music 086 010 50Book 092 008 50

Usefulness [5]

Travel 334 085 103Movie 355 073 50Music 318 085 53Book 343 081 50

hypothesized that advanced filters improve recommen-dation quality as they may identify more relevant re-sources thereby overcoming problems of data qualityor data sparsity in LOD repositories Table 13 lists re-sponse rates of the two methods The rate measuresthe ratio of non-zero result sets among all result setsThroughout the domains recommendation lists result-ing from expressive filtering achieved higher responserates

Table 13Response Rates (TC2)

Domain Constr-based(Regular)

Constr-based(Expanded)

N

Travel 23 57 103

Movie 86 88 50

Music 72 75 53

Book 73 76 51

22 L Wenige et al Similarity-based Graph Queries

000

025

050

075

100

Digital Library Travel Movie Music Book

Domain

Per

cent

age

of A

gree

men

t

Degree of Agreement

1

2

3

4

5

The filter has improved the recommendations of the list

Fig 8 Participantsrsquo agreement with the statement that the filter has improved the result list (domain-wise) (TC2)

In addition to response rates user assessments andscores in the defined quality dimensions were com-pared Table 14 depicts the outcome of this analysisSignificantly higher scores are marked in bold figuresIn case statistical tests identified no meaningful varia-tions the results of the better performing approach areunderlined A-levels were adjusted with the false dis-covery rate (FDR) in order to decrease the probabilityof making a type I error due to multiple testing

According to mean scores expressive requestsgenerated more comprehensive recommendation lists(size) This finding is in line with the increased re-sponse rates shown in Table 13 However subse-quently conducted t-tests confirmed significant dif-ferences only for the travel experiment (t(102) =minus789 p lt 0001) In contrast precision scores werehigher in most of the domains (travel movie mu-sic) when the SKOSRec engine processed a simpleconstraint-based query However statistical tests didnot confirm any systematic differences The authorsapplied Wilcoxon tests for mrs scores and the remain-ing performance metrics because of the small samplesizes resulting from the low response rates of the sim-ple constraint-based approachRegarding Novelty none of the approaches was supe-rior In the Diversity dimension expressive filters gen-erated suggestions that were at least as topically di-versified as the results from simple constraint-basedretrieval Even though no significant differences were

identified between the two approaches increased meanscores (divU in the travel domain and divC in each do-main) indicate a slight superiority of expanded filterrequests in this quality dimension The same appliesto usefulness scores (mean values were higher in 3 outof 4 domains when expressive filtering was applied)However these statements are speculative at best be-cause statistical tests were not significantIn summary it can be concluded that expressive fil-ters improve recall potentially leading to a slight lossin precision scores (see Table 14) It may also be thecase that expanded user constraints diversify as wellas increase the usefulness of recommendation listsHowever these claims are unverified In contrast itis safe to say that expressive query patterns generateresults of at least the same quality as simple patternswhile increasing the number of relevant suggestionsThe few differences may be explained by the high Jac-card scores (JI) (see Table 15) of recommendation listsThroughout the domains result sets were much alikeGiven these similarities and the increases in list sizesfor expressive filters it is supposed that an expandedconstraint produces an enhanced version of the recom-mendation list resulting from simple filtering There-fore expanded filters should be the default setting in aconstraint-based retrieval context

In TC3 (ie evaluation of roll-up query patterns)the SKOSRec engine almost always generated non-empty recommendation lists (see Table 16) In the

L Wenige et al Similarity-based Graph Queries 23

Table 14Performance results in TC2 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Simple Expressive N

M SD M SD

Accuracy (size) [10]

Travel 117 270 452 462 103Movie 672 408 720 392 50Music 596 464 636 451 53Book 459 450 500 451 51

Accuracy (mrs) [1]

Travel 6673 2051 6534 2236 23Movie 6095 2298 6029 2279 43Music 6377 1605 6296 1644 34Book 5617 1818 5681 1681 37

Novelty (nv) [1]

Travel 048 041 061 049 21Movie 044 036 055 080 42Music 055 039 052 037 38Book 077 034 076 033 33

Diversity (divU) [5]

Travel 3 - 35 - 24Movie 4 - 4 - 43Music 4 - 4 - 38Book 3 - 3 - 37

Diversity (divC) [1]

Travel 068 018 072 021 19Movie 070 024 073 023 41Music 086 013 087 012 34Book 084 024 095 063 29

Usefulness [5]

Travel 333 093 350 077 24Movie 332 085 331 086 43Music 355 081 364 078 38Book 339 083 362 121 37

Table 15Jaccard Indices (TC2)

Domain Mean Jaccard Index (JI) SD N

Travel 057 047 25

Movie 071 038 43

Music 082 035 37

Book 092 026 35

travel and the book experiment the response rate washigher for aggregation-based queries whereas in themovie domain the engine more often provided recom-mendations for regular requests However these dif-ferences are only marginal and do not indicate a clearsuperiority of one method

Fig 9 depicts participants general agreement to Lik-ert statements concerning the Perceived Usefulness ofthe approaches in TC3 The diagram shows similarresults for regular as well as advanced requests Onaverage satisfaction scores reached levels which layslightly above neutral agreement A subsequent t-test

Table 16Response Rates (TC3)

Domain Regular Aggr-based N

Travel 99 100 103

Movie 96 92 50

Music 100 100 53

Book 96 100 51

confirmed no significant differences between the twomethods In fact the approach with the highest aver-age score of Perceived Usefulness was different in eachexperiment While mean values were higher for regu-lar requests in the studies on travel and music RS ad-vanced queries produced higher scores in the movieand book experiments

Content-based Diversity (divC) was the only met-ric with systematic differences Here scores went upfor advanced retrieval patterns in the travel (t(98) =minus550 p lt 0001) the music (t(52) = minus451 p lt

0001) and the book domain (t(47) = minus490 p lt

24 L Wenige et al Similarity-based Graph Queries

0

1

2

3

Travel Movie Music Book

Domain

Agr

eem

ent

Method

Regular

Rollminusup

Perceived Usefulness (Regular vs Rollminusup Queries)

Fig 9 Participantrsquos overall satisfaction with regular and roll-up recommendation retrieval (TC3)

0001) (see Table 17) This finding is not surprisinggiven that the aggregation-based approach explores thewider semantic network of each item in the profile

Hence it can be assumed that both approaches pro-duce recommendation lists of similar quality and canprovide benefits to potential consumers Thus it can-not be clearly stated which approach is superior be-cause neither of them outperformed the other in eachdomain While the regular retrieval strategy was bet-ter in terms of usefulness in the travel and the mu-sic domains the aggregation-based approach achievedhigher scores in the domains of movie and book rec-ommendationsThis conclusion is especially impressive given thelow mean Jaccard scores for recommendation lists re-sulting from regular and aggregation-based retrieval(see Table 18) The mean score never exceeded 01

throughout the domains It indicates a low concor-dance between result sets It is remarkable that par-ticipants perceived the recommendations as equallygood given the high dissimilarity of the lists Thusadvanced queries have an added-value as they provideadditional interesting results They can help to exploreLOD repositories in different ways than a regular re-trieval approach Therefore it is worthwhile to offeraggregation-based query patterns as an alternative re-trieval strategy when the regular approach has not pro-duced any helpful recommendations

In addition to TC3 user assessments for TC4 (cross-domain recommendations) were also evaluated How-ever since the authors conducted experiments for TC4only in the multimedia domains the samples were con-siderably smaller Nevertheless it is assumed that theamount of completed user sessions is only just enoughto conduct further statistical analyses Table 19 showsthe response rates for the two retrieval methods Cross-domain patterns had a higher success rate of generat-ing non-zero result sets throughout the domains Par-ticipants almost always received a recommendation forthis query type whereas in case a regular SPARQLquery was executed often they did not receive a singlesuggestion at all

We applied non-parametric tests for the perfor-mance metrics mrs nv divU divC and PerceivedUsefulness due to the low response rates of theSPARQL-based approach which led to a low num-ber of comparable data points accordingly Overall theFDR-adjusted pairwise Wilcoxon tests detected no sig-nificant differences except for user-based Diversityscores (divU) in the movie (Z = minus269 p lt 005) andthe book domain (Z = minus256 p lt 005) The resultsfor the content-based Diversity (divC) metric seem topoint in the same direction because of the high meanvalues for cross-domain queries in each domain Butthe significance of these results was not verified Themrs scores indicate a better performance of regular

L Wenige et al Similarity-based Graph Queries 25

Table 17Performance results in TC3 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular Aggr-based N

M SD M SD

Accuracy (size) [10]

Travel 990 099 942 207 103Movie 288 059 276 082 50Music 1000 000 1000 000 53Book 288 059 300 000 51

Accuracy (mrs) [1]

Travel 4821 2498 4493 2559 102Movie 5458 2288 5669 2542 44Music 5395 1831 5824 1619 52Book 5153 2390 4925 2230 49

Novelty (nv) [1]

Travel 043 039 038 040 98Movie 035 038 028 041 41Music 049 036 046 039 50Book 061 042 062 044 47

Diversity (divU) [5]

Travel 35 - 3 - 100Movie 4 - 3 - 43Music 4 - 3 - 50Book 3 - 3 - 48

Diversity (divC) [1]

Travel 079 017 089 013 99Movie 075 020 066 027 44Music 083 017 093 007 53Book 074 027 093 006 48

Usefulness [5]

Travel 336 077 322 074 101Movie 320 092 326 094 44Music 345 084 344 085 51Book 295 087 303 096 48

Table 18Jaccard Indices (TC3)

Domain Mean Jaccard Index (JI) SD N

Travel 007 011 101

Movie 003 007 44

Music 002 005 53

Book 002 006 50

Table 19Response rates (TC4)

Domain Regular (SPARQL) SKOSRec N

Movie 32 96 50

Music 62 98 53

Book 53 100 51

SPARQL queries but the within-subjects test did notconfirm this assumption statistically (see Table 20)

However the most important finding of the anal-ysis was that SKOSRec cross-domain queries gener-ated significantly more suggestions throughout the do-

mains (movie t(49) = minus1792 p lt 0001 mu-sic t(52) = minus2573 p lt 0001 book t(50) =minus1275 p lt 0001) (see Table 20) Fig 10 illustratesthis graphically

Whenever the SPARQL request provided recom-mendations which was the case in less than half ofthe user sessions (see Table 19) the SKOSRec sug-gestions were assessed to be of almost equal qualityas the SPARQL recommendations (see Table 20 Use-fulness) Hence when SPARQL querying did not pro-duce any results in more than the other half of the testcases the SKOSRec approach may still have provideduseful suggestions

Therefore it is reasonable to assume that the capa-bility of the SKOSRec engine to flexibly switch be-tween processing steps of similarity calculation andpattern matching can facilitate an improved explo-ration of LOD repositories Table 21 also shows thatthe items in the two sets often did not match as isindicated by low Jaccard indices This finding furtherdemonstrates that the execution of an additional step

26 L Wenige et al Similarity-based Graph Queries

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

00

25

50

75

100

Movie Music Book

Domain

No

of i

tem

s

Method

Regular

CrossminusDomain

Recommendation List Size

Fig 10 Mean recommendation list sizes for SPARQL and SKOSRec cross-domain requests (TC4)

L Wenige et al Similarity-based Graph Queries 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

of similarity calculation can help to retrieve other rel-evant items

5 Future Work amp Conclusion

This paper has presented the query-based SKOS-Rec engine that facilitates new types of recommen-dation requests for LOD repositories The evaluationshave shown that the SKOSRec system can often gener-ate relevant and useful suggestions when certain querytemplates are utilized In many cases the applicationof expressive or advanced query patterns (ie in thecontext of constraint-based retrieval or cross-domainrequests) helped to significantly improve recall valueswhile still providing the same level of quality in theremaining performance dimensions Additionally ad-vanced recommendation requests can be used to gener-ate diversified recommendation lists in case users arenot satisfied with the results of regular recommenda-tion requests (ie roll-up or cross-domain retrieval vsregular requests) While not being superior in each do-main and test case is at least safe to say that the novelrecommendation approaches of the SKOSRec engineconsiderably extend common retrieval methods and atleast provide an alternative search strategy when con-ventional methods fail to provide useful results Addi-tionally the increase in diversity and recall is in linewith findings from previous research on LOD-enabledRS [45]It is a strength of the developed approaches that theyfacilitate combinations of graph-based and similarity-based retrieval at different stages of the recommenda-tion workflow This feature extends general search ca-pabilities for semantic networks It requires future re-search to investigate whether in addition to SKOS fur-ther RDF vocabularies can be used for ad-hoc item-to-item similarity computation to increase the range ofpotential usage scenarios since not every LOD datasetcontains SKOS annotations [53] The application ofadditional properties could also boost the recommen-dation quality of some query patterns (eg such as thequery types used in TC3) In this context it also needsto be determined whether the representation of the user

profile in the SKOSRec query language can be substi-tuted with free-text expressions to enable search-likefunctionalities LOD researchers have already devel-oped engines that can process natural language queriesover RDF data [59] [34] [67] It will have to be inves-tigated how the SKOSRec engine profitably fits intothis landscape of existing retrieval toolsAnother open research question concerns the aspect ofinterfaces to other applications In its standard config-uration the system is not an end-user retrieval enginebut a backend application with which an administra-tor who has domain knowledge (eg of RDF vocab-ularies and data models of a particular LOD reposi-tory and usage scenario) creates the respective querypatterns Although the websites of the online experi-ments represent possible implementations of suitableend-user applications for the SKOSRec engine thereare still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selec-tion of LOD items for the profile was made possibleby previously storing the data in an index that couldthen be accessed and searched before issuing a recom-mendation query However this runs against the ideaof ad-hoc retrieval In this context future research willhave to determine whether it is feasible to generate anon-the-fly mapping from items to the respective LODresources In the case of a commercial application theengine needs to match items with the correspondingentries in the product catalog Even if it is not yet clearwhich UI components can make the best possible useof the SKOSRec enginersquos features the results from theuser experiments can give at least a few suggestionsThe following retrieval options should be the defaultsetting in an interface that contains a LOD-enabledRS component to assist users in finding interestingitems Since the regular approach of on-the-fly re-trieval achieved good results throughout the domainsthe display of simple recommendations generated frompreviously stated user preferences represents a suitablestarting point for retrieval On-the-fly queries could berefined by providing filter options Since the evalua-tions have shown that expressive constraints often leadto increased recall values the application of such afilter might be the best option for assisted retrievalAs in the preparations for the experiments appro-priate graph-based query patterns for the usage sce-nario in question would also have to be determinedby a domain expert before setting up the applicationThe same applies to advanced retrieval patterns whichshould be available to the end user to refine the querywhen both the simple as well as the constraint-based

28 L Wenige et al Similarity-based Graph Queries

approach have failed to provide relevant items Thissuggestion is made because roll-up queries were com-petitive with regular recommendations in the web-based experiments Therefore they represent a viablealternative retrieval strategy For some domains (egfor multimedia retrieval) it is also possible to usean optional selection field to enable the formulationof cross-domain requests Evaluations for this querytype suggest that users might receive interesting results(ie diversified and comprehensive recommendationlists) from such a requestIn summary the contributions of the paper are as fol-lows

ndash Development and implementation of novel re-trieval strategies in the SKOSRec system whichfacilitate personalized requests and leverage thepotential of knowledge graphs in order to addressthe research gap in state-of-the-art recommenda-tion methods with regard to customized queries

ndash Realization of web-based user experiments infour usage scenarios with the following out-comes

bull Successful demonstration that knowledge graphdata is applicable in the context of live recom-mendation scenarios

bull Creation of suitable query patterns that canbe applied to represent individual informationneeds in different application contexts The setof query patterns can be extended with the helpof the SKOSRec query language to meet therequirements of additional usage scenarios

bull Proof that the novel retrieval strategies can besuccessfully applied across various applicationcontexts and recommendation tasks and thatthey outperform conventional content-basedretrieval approaches especially in the perfor-mance dimensions of diversity and recall

The prototypical implementation and the evaluationof the SKOSRec engine in different usage scenarioshave demonstrated that the freely available knowledgesources on the LOD cloud can be successfully appliedto advance state-of-the-art methods in IR and recom-mender systems The proposed personalized and cus-tomized retrieval strategies can leverage the potentialof knowledge graphs by better matching informationneeds of users with the help of semantic search op-tions

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be com-prised of up to six elements of which the partsSimProjection and ItemPart are obligatory Henceusers need to state at least their preferencesand how many recommendations they wouldlike to receive Advanced retrieval patterns canbe formulated as well For instance similar re-source retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (Se-lectPart) of LOD resources As in SPARQL thequery can start with a prologue that abbreviatesIRIs with namespace declarations [21]

SelectPart The SelectPart part enables subqueryingwith recommendation results The surroundingquery is a SELECT query with a slightly simpli-fied WHERE clause (ie RecWhereClause) withwhich postfilter conditions can be formulatedIn this section users have the option to formu-late aggregation-based retrieval requests (Aggre-gation)

Aggregation This query part handles aggregation-based postfiltering Users specify the IRI (IRIref )resource for which similarity scores of sublevelentities are summarized Additionally it is statedhow the data should be aggregated (ie SUMMAX or AVG) as well as which variable (Var)represents the aggregation-based recommenda-tions in the postfilter section The compiler makessure that this variable is actually contained in theRecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refersto the list of generated suggestions It definesthe recommendation variable to which all simi-lar LOD resources are mapped In case pre- andpostfilter conditions are set in the other sectionsthe compiler checks whether the variable (Var)appears in these sections as well Otherwise therequired join operations cannot be executed Itis specified how many recommendations shouldbe displayed (Integer) and whether the requestwill be issued against a default repository or as across-repository request (ServiceIntegration)

ServiceIntegration This section indicates that a userintends to receive recommendations from a dif-ferent SPARQL endpoint than the default end-point that is stated in the standard configurationof the SKOSRec engine It specifies the targetSPARQL endpoint (IRIref ) from which a user

L Wenige et al Similarity-based Graph Queries 29

wants to receive recommendations Upon extrac-tion of SKOS annotations from the default sourceendpoint a subsequent request is sent to the spec-ified target endpoint

ItemPart This section of a SKOSRec query repre-sents a single preference in the user profile How-ever recommendation queries can contain a cou-ple of preference statements A preference is ei-ther expressed as a LOD resource (IRIref ) or asa variable which is bound to a preference query(VarPart) In the latter case the resources areobtained by matching the graph pattern in theVarPart with the triple statements in the reposi-tory The ItemPart statement can be additionallysupplemented with a concept-to-concept similar-ity threshold (Sim) that triggers concept expan-sion with the approach of flexible similarity de-tection which is extensively described in one ofour previous works [64]

VarPart The construct initiates preference queryingIndividual likings are expressed as a graph pat-tern in a RecWhereClause The variable (Var) isa placeholder for the LOD resources that will beused to set up the user profile The compiler hasto make sure that the specified variable occurs inthe RecWhereClause as well Otherwise the ex-traction of LOD resources cannot be carried outcorrectly

Sim The Sim part of a SKOSRec query denotes thethreshold of concept-to-concept scores for flexi-ble similarity detection Even though knowledge-based similarity values usually range between 0and 1 the specification allows a broader range ofdigits in the Decimal datatype to enable applica-tion of other similarity metrics in case they areneeded

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users canstate whether scores should be ldquolargerrdquo ldquolargerthanrdquo or ldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three dif-ferent sections of a SKOSRec query ie either inthe prefilter (ItemPart) the postfilter (SelectPart)or the preference filter section (VarPart) TheRecWhereClause closely resembles the ldquoWhere-Clauserdquo of the SPARQL 11 specification withall its subsequent parts [21] Hence it enablesdifferent combinations of graph pattern match-ing However the RecWhereClause of the SKOS-Rec language allows fewer pattern matching ex-pressions than the ldquoWhereClauserdquo of the regular

SPARQL syntax specification [21] For instanceit is neither permitted to formulate subqueries norto direct filter requests to more than one reposi-tory (see RecGraphPatternNotTriples for furtherexplanations) This modification has been madeto simplify similarity calculation After parsingthis section the compiler makes sure that recom-mendation preference or postfilter variables oc-cur at least once in the RecWhereClause to guar-antee that subsequent steps of the recommenda-tion workflow can be applied on existing LOD re-sources In case the RecWhereClause comprises aRecMinusGraphPattern it also has to be checkedthat the variable is not only contained in theMINUS part of the group

RecGroupGraphPattern A RecGroupGraphPatterncan contain one to many basic graph patternswhich can be differently combined according toRecGroupGraphPatternSub

RecGroupGraphPatternSub This section defines thegraph patterns that are applied to identify suit-able LOD resources User filters can be either ex-pressed as basic graph patterns (TriplesBlock) oras combinations of them (RecGraphPatternNot-Triples) The sequence of different kinds of pat-terns can be flexibly specified

RecGraphPatternNotTriples This query part closelyresembles the similar named ldquoGraphPatternNot-Triplesrdquo of the SPARQL 11 specification [21]The only difference to SPARQL is that the Rec-GraphPatternNotTriples section is a little morerestricted in terms of admissible graph patternsthan the ldquoGraphPatternNotTriplesrdquo section of theregular SPARQL specification For instance itdoes not allow to retrieve LOD resources otherthan from the default graph that is specified in theconfiguration of the SKOSRec engine since filterpatterns have to be applied to datasets that containSKOS annotations Hence they need to be speci-fied before runtime execution to ensure that sim-ilarity calculation can be carried out correctly Ifa user was able to formulate filter conditions thatreferred to different RDF graphs and endpointsaccordingly this would not be possible

RecGroupOrUnionGraphPattern This query sec-tion marks an alternative graph pattern in the styleof the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntaxspecification [21] However it neither allowsSPARQL-like subqueries nor federated queries

30 L Wenige et al Similarity-based Graph Queries

for filtered resources because this would compli-cate similarity calculation

RecOptionalGraphPattern In this part of a SKOS-RecQuery users are enabled to specify additionalgraph patterns which might extend the query so-lution but do not necessarily have to match thedata

RecMinusGraphPattern This section handles exclu-sion of LOD resources for cases when users wantto omit certain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

L Wenige et al Similarity-based Graph Queries 31

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM A

mobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige J Ruhland Flexible on-the-fly recommendationsfrom Linked Open Data repositories In International Confer-ence on Business Information Systems pp 43ndash54 2016

[63] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[64] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[65] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[66] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[67] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Future Work amp Conclusion
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 4: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge … · Semantic Web 0 (0) 1 1 IOS Press Similarity-based Knowledge Graph Queries for Recommendation Retrieval Lisa Wenige*,

4 L Wenige et al Similarity-based Graph Queries

Table 1Summary of existing retrieval and recommendation approaches

System Type LOD-enabled A SPARQL-based queries B On-the-fly similarity C Flexible combination ofA and B

IR systems x x x x

Query-based RS x x X (X)

Index-based LinkedData IR systems

X x x x

On-the-fly LinkedData IR systems

X x X x

LDRS X x x x

Query-based LDRS X X x x

Listing 1 Grammar of the SKOSRec query languageSKOSRecQuery = Prologue S e l e c t P a r t S i m P r o j e c t i o n PREF I t e m P a r t + RecWhereClause S e l e c t P a r t = SELECT ( DISTINCT | REDUCED) Var+ RecWhereClause

LimitClause A g g r e g a t i o n A g g r e g a t i o n = AGG IRIref Var (SUM | MAX | AVG )S i m P r o j e c t i o n = RECOMMEND Var TOP INTEGER S e r v i c e I n t e g r a t i o n S e r v i c e I n t e g r a t i o n = FROM SERVICE IRIREFI t e m P a r t = ( V a r P a r t | IRI ) SimV a r P a r t = [ Var RecWhereClause ]Sim = SIM R e l a t i o n DECIMALR e l a t i o n = ( gt | gt= | = )RecWhereClause = WHERE RecGroupGraphPa t t e rnRecGroupGraphPa t t e rn = rsquo rsquo RecGroupGraphPa t t e rnSub rsquo rsquoRecGroupGraphPa t t e rnSub = TriplesBlock ( R e c G r a p h P a t t e r n N o t T r i p l e s rsquo rsquo TriplesBlock)lowastR e c G r a p h P a t t e r n N o t T r i p l e s = RecGroupOrUnionGraphPat te rn | R e c O p t i o n a l G r a p h P a t t e r n |

RecMinusGraphPa t t e rn | Filter | Bind | InlineDataRecGroupOrUnionGraphPa t te rn = RecGroupGraphPa t t e rn ( rsquoUNIONrsquo RecGroupGraphPa t t e rn )lowastR e c O p t i o n a l G r a p h P a t t e r n = rsquoOPTIONALrsquo RecGroupGraphPa t t e rnRecMinusGraphPa t t e rn = rsquoMINUSrsquo RecGroupGraphPa t t e rn

tains at least a mapping to a single variable A detailedexplanation of each part of the SKOSRec grammar canbe found in the Appendix A of this paperFig 1 depicts the overall architecture of the systemwith all its components The most important parts arethe SKOSRec query processing units (ie the parserand the compiler) They check the syntax of recom-mendation requests and regulate the correct execu-tion of critical operations of the engine (eg graph-based filtering result set joins similarity calculation)The architecture does not dictate a particular SPARQLserver implementation for LOD repositories since thesystem is decoupled from the endpoint that is accessedover HTTP during retrieval However the prototypi-cal implementation of the SKOSRec engine utilizes theOpenLink Virtuoso server2

Fig 1 also shows the sequential order of a typicalrecommendation workflow for a complex request thatconsists of the following retrieval steps

2httpsvirtuosoopenlinkswcom

1 Issuing a SKOSRec query2 Parsing the SKOSRec query3 Compiling the SKOSRec query4 Loading information on the LOD repositoryS-

PARQL endpoint to be queried from the config-uration

5 Retrieval of preferred items and setting up of theuser profile (see Subsect 33 Preference Query-ing)

6 Retrieval and processing of relevant LOD re-sources upon application of prefilter conditions(see Subsect 34 Prefiltering)

7 Optimizing the retrieval process through identi-fying the resources that can be omitted for thefinal ranking3

8 Determination of similar LOD resources basedon the user profile (see Subsect 32 On-the-FlyRecommendations

3 For further information on optimized retrieval please refer to[62]

L Wenige et al Similarity-based Graph Queries 5

Fig 1 Architecture and workflow of the SKOSRec engine

9 Ranking of LOD resources10 Sending of recommendation results to the com-

piler and joining them with postfilter constraints11 Retrieval of the final result set (see Subsect 35

Postfiltering amp Combinations)12 Output to the user

32 On-the-Fly Recommendations

A simple on-the-fly request is the most basic form ofa SKOSRec query It is based on the enginersquos ability togenerate ad-hoc recommendations from a user profilecontaining preference statements for LOD resourcesListing 2 depicts an example SKOSRec on-the-fly re-quest that obtains suggestions based on a userrsquos pref-erence for the western movie ldquoThey Call Me Trin-ityrdquo In the given query the preferred movie is rep-resented by the DBpedia resource (dbrThey_Call_Me_Trinity) The shown query also contains a prefixwith a namespace declaration and a specification re-garding the number of suggestions the engine shouldgenerate

Listing 2 A simple recommendation query for themovie domain (Q1)

PREFIX dbr lthttpdbpediaorgresourcegt

RECOMMEND movie TOP 5PREF dbrThey_Call_Me_Trinity

In this basic form the query does neither con-tain any pre- or postfilter conditions nor a preferencequery statement Thus the engine simply identifies allmovies that are similar to the preference in the profileWhen executed over DBpedia the query returns thesolution set that is depicted in Table 2

Table 2Result set for Q1

moviedbrGod_Forgivesrsquo_I_DonrsquotdbrAce_High_(1968_film)dbrBoot_Hill_(film)dbrTrinity_Is_Still_My_NamedbrTroublemakers_(1994_film)

For determining recommendations in an ad-hocfashion the system identifies SKOS annotations bymatching a suitable property (ie annotation property)in an RDF dataset (Definition 1)

6 L Wenige et al Similarity-based Graph Queries

Definition 1 (Annotation property) An annotationproperty is an IRI (Listing 3) that is defined in theDublin Core Metadata Initiative (DCMI) specificationfor subject annotations It can occur in conjunctionwith one of the two namespaces which are stated bythe standard [15]

Listing 3 Annotation property

ltANNOTPROPgt = dcsubject | dctsubject

These properties help to retrieve SKOS annotationtriples for items preferred by a user (Definition 2)

Definition 2 (SKOS annotation triple) A SKOS anno-tation triple (st) is a triple that is comprised of an IRIin the subject position an annotation property (AN-NOTPROP) as predicate and a concept c in the objectposition The concept is part of a knowledge organiza-tion system S in SKOS format (C = c | c isin S ) (Eq1)

st isin I times ltANNOTPROPgttimesC (1)

The identification of SKOS annotation triples facil-itates a fast retrieval of potential similar items There-fore the engine evaluates shared features between re-sources from the SKOS annotation dataset (Definition3)

Definition 3 (SKOS annotation dataset) A SKOS an-notation dataset (AD) is a subset of an RDF datasetthat contains SKOS annotation triples (Eq 2)

AD sub I times ltANNOTPROPgttimes C (2)

Before the similarity calculation process starts theengine determines SKOS annotations for each LODresource (r) from the user profile by issuing a sub-sequent SPARQL query to the LOD repository fromwhich the user intends to receive suggestions The no-tion of SKOS annotations is specified in Definition 4

Definition 4 (SKOS annotations) In the annotationdataset LOD resources directly link to concepts ofa SKOS system via a predefined annotation propertySKOS annotations for an input resource r are definedas in Equation 3

Annot(r) = c isin S | exist(rltANNOTPROPgt c) isin AD (3)

Afterwards the engine identifies which of the otherresources in the dataset shares annotations with itSimilarities only have to be computed for items withmatching concepts [13 52] Therefore triples canbe efficiently joined along item features since nativeRDF storage systems usually index triples rather thancolumns [41] Thus the quick identification of mu-tual annotations through SPARQL requests facilitatesad-hoc similarity calculation Definition 5 specifies thenotion of relevant resources and their annotations thatare needed to identify similar items The applied syn-tax follows Peacuterez et al [42]

Definition 5 (Relevant resources and their annota-tions) The conjunctive graph pattern Pr matches allitems and subjects that are potentially relevant for rec-ommendation retrieval (Eq 4)

Pr = (rltANNOTPROPgt c) AND (qltANNOTPROPgt c)

(4)

The mapping Ωr of relevant resources and theirSKOS annotations is obtained by retrieving all re-sources q that share at least one SKOS concept c withresource r from AD (Eq 5) The corresponding resultset contains mappings for all variables (ie c and q)in the triple statements of Pr

Ωr = [[Pr]]AD (5)

Upon extraction of relevant resources and annota-tions the process of similarity calculation can startThe engine determines the information content (IC) ofthe shared SKOS annotations of two resources Thisapproach is based on the hybrid similarity metric pro-posed by Meymandpour and Davis [37] for LOD-enabled RS However while their measure considersall types of IRI resources for the retrieval our similar-ity metric purely relies on SKOS annotations (Defini-tion 6)

Definition 6 (SKOS similarity) Let Annot(r) be theset of SKOS features of resource r and Annot(q) the setof SKOS features of resource q and q isin micro(q) | micro isinΩr then their similarity can be derived from the IC oftheir shared concepts Cshared = Annot(r) cap Annot(q)

sim(r q) = IC(Cshared) (6)

L Wenige et al Similarity-based Graph Queries 7

The IC of a set of SKOS concepts is measured bythe sum of the inverse logarithms of each conceptrsquos fre-quency in relation to the maximum frequency of oc-currences among relevant resources (Definition 7)

Definition 7 (SKOS Information Content) The SKOSIC is defined as an aggregation of the individual ICvalues of each concept that is contained in the setof shared features (c isin Cshared) where f req(c) isthe frequency of occurrences of c among all rele-vant resources and n is the maximum frequency of oc-currences among these resources The final similarityscore of two resources r and q is determined by sum-mating the IC values of each concept of the shared fea-ture set

IC(Cshared) = minussum

cisinCshared

log(

f req(c)

n

)(7)

After having obtained resource annotations as wellas IC values similarity scores between the input re-source r and each potentially relevant resource q canbe calculated When a user profile contains more thana single item similarity values are aggregated throughsummation to determine the final recommendationscore (Definition 8)

Definition 8 (Recommendation score) The recom-mendation score of a potentially relevant resource q isquantified by the sum of similarities with each resourcer that can be found in the profile (Pr)

score(Pr q) =sumrisinPr

sim(r q) (8)

Upon calculation of similarity values for each po-tentially relevant item q the engine ranks the retrievedresources based on a given limit (k) that is specified bythe user in the SKOSRec queryWith the presented methods of on-the-fly retrieval rec-ommendations can be generated from LOD reposito-ries through an API interface without requiring lo-cal metadata or excessive preprocessing operationsother than the mapping of profile items with LOD re-sources4

4Di Noia et al describe how a mapping procedure can beconducted for a real-world implementation For their LDRS theymatched movie titles with the corresponding LOD resources in DB-pedia The authors extracted IRIs labels and publication dates forall movie resources They applied the Levenshtein distance on theseresources to identify matching movies items [13]

33 Preference Querying

In the previous subsection it was assumed that usersexpress their tastes in the form of preference state-ments for a certain item in the form of an IRI refer-ring to a LOD resource However it may also be thecase that likings are only vaguely known for instancewhen users prefer products with specific features butare not able or willing to specify the actual itemsThe engine obtains such a profile by issuing a sub-sequent SPARQL query that contains the preferencegraph pattern (Ppre f ) in the WHERE condition Sup-pose a user is interested in Quentin Tarantino movies(dbrQuentin_Tarantino) but does not specify theprecise items she likes He could send a preferencequery to the SKOSRec engine The graph pattern inthe request would match all movies that are con-nected to dbrQuentin_Tarantino with the prop-erties dbodirector or dboproducer (see Listing7)

Listing 4 SKOSRec query to generate recommenda-tions based on a preference query in the movie domain(Q2)

PREFIX dbr lthttpdbpediaorgresourcegt PREFIX dbo lthttpdbpediaorgontologygt

RECOMMEND movie TOP 5PREF [ prefMovieWHERE prefMovie dbodirector dbrQuentin_Tarantino UNIONprefMovie dboproducer dbrQuentin_Tarantino ]

Table 35 shows the corresponding solution set tothe preference query containing films from DBpediathat are similar to Quentin Tarantino movies

Table 3Result set for Q2

moviedbrThe_Godfather_Part_IIdbrThe_Prestige_(film)dbrThe_Dark_KnightdbrPlanet_TerrordbrClerks

The notion of a queried profile is specified in Defi-nition 9

Definition 9 (Queried profile) A queried profile is auser profile that has been compiled by retrieving allLOD resources r that are part of an annotation graph

8 L Wenige et al Similarity-based Graph Queries

AD and are contained in the solution mappings result-ing from the evaluation of a specified graph pattern(Ppre f ) over a dataset D (Eq 9)

Pr = micro(r) | micro isin [[Ppre f ]]D r isin dom(micro) r isin AD

(9)

34 Prefiltering

To facilitate exploratory search in LOD reposito-ries users should also have the option to formulateconditions in their queries A possible way to enablefilters during retrieval is before similarity calculationThus recommendation lists can be adjusted to individ-ual needsDue to the expressiveness of RDF data filter condi-tions can not only be applied on attributes that are di-rectly connected to potentially relevant items but canalso be declared as a graph pattern By this means it ispossible to retrieve LOD resources that are both simi-lar to the user profile as well as fulfill an advanced con-ditionThe strengths of the approach will be illustrated with atravel recommendation request that was executed overDBpedia Suppose a customer expressed a preferencefor the Lake Baikal region (eg as indicated by a pos-itive rating on a travel community site) With simpleon-the-fly retrieval she would receive a recommen-dation list that primarily consists of travel destinationsthat are located in the proximity to this geographicalarea A graph-based filter on the other hand facili-tates advanced retrieval options In the given exampleit enables the user to obtain suggestions for SoutheastAsian travel destinations that have similar features asthe Lake Baikal region even though the resource dbrLake_Baikal is not directly connected to any desti-nation with the annotation dbcSoutheast_Asia butcan only be identified by exploring the surroundinggraph structure of the filter attributeThe advanced constraint ensures that the generatedresult set is not empty A simple attribute-level fil-ter which is commonly utilized in faceted search sys-tems would not generate any search hits By thesemeans graph-based filter patterns can help users tobetter explore knowledge graphs thereby overcomingdata quality issues in LOD repositories The corre-sponding SKOSRec query to this example is shown inListing 5

Listing 5 SKOSRec query with an expressive pre-filter condition to retrieve Southeast Asian destinationsbased on a preference for the Lake Baikal region (Q3)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbc lthttpdbpediaorgresourceCategorygtPREFIX dbo lthttpdbpediaorgontologygtPREFIX dct lthttppurlorgdctermsgtPREFIX skos lthttpwwww3org200402skoscoregt

RECOMMEND destination TOP 5PREF dbrLake_Baikal[ WHERE destination dbocountry country

country dctsubject countrySubjectcountrySubject skosbroader dbcSoutheast_Asiadestination rdftype dboPlace

]

Table 34 shows the solution to this query It con-tains travel destinations and regions of interest thatare both similar to the previously liked destinationdbrLake_Baikal and are located in Southeast AsiaEcoregions around lakes rivers or mountains fromSoutheast Asian countries such as Cambodia Vietnamor Malaysia are presented to the user

Table 4Result set of Q3

destinationdbrTonle_SapdbrMekongdbrMount_KinabaludbrLaguna_de_BaydbrLake_Toba

The technical details for this type of query are givenin the following definitions Before the extraction ofrelevant resources and annotations the recommenda-tion engine identifies the resources that satisfy the fil-ter (Definition 10)

Definition 10 (Prefiltered resources) The mapping offiltered resources Ωpre f ilter is obtained by matching aspecified graph pattern (P) to the RDF graph of adataset D (Eq 10)

Ωpre f ilter = micro|q | micro isin [[P]]D (10)

After preselection resources are brought togetherwith user profile data to retrieve the final set of recom-mendations (Definition 11)

Definition 11 (Annotations and relevant resources(prefiltered retrieval)) The solution mappings of an-notations and relevant resources in prefiltered retrievalmode is obtained by joining the set of prefiltered re-

L Wenige et al Similarity-based Graph Queries 9

sources with the mappings Ωr of all relevant resourcesand annotations as determined from the user profile

Ωpre = Ωpre f ilter Ωr (11)

After retrieving annotations and relevant resourcesin prefiltering mode the system generates constraint-based recommendations In this context the rankingprocedure is adapted to the restricted set of relevantresources and annotations Hence IC values of match-ing annotations and respective SKOS similarities arequantified according to the user filter By this meansrecommendation scores reflect the similarity of itemsfor the actual set of filtered LOD resources Except forthis restriction the system carries out the calculations(ie determination of similarity values the ranking ofLOD resources) in the same way as in non-filteringmode

35 Postfiltering amp Combinations

The SKOSRec engine cannot only combine simi-larity calculation and graph pattern matching to pre-filter relevant LOD resources it can also apply userconstraints after similar items have already been de-termined By this means recommendations are filteredex-post to the similarity detection process This fea-ture enables novel retrieval patterns in which sugges-tions are part of a subquery As an example scenariofor a postfilter recommendation task consider the fol-lowing SKOSRec query which obtains suggestions formovies and directors based on a userrsquos preference forthe director dbrQuentin_Tarantino (see Listing6) For the given request the engine would identifysimilar directors based on Quentin Tarantinorsquos char-acteristic features (eg geographic origin or awardswon) and would determine the movies these similar di-rectors have shot Thus users may receive interestingmovie recommendations

Table 5 shows the solution the SKOSRec system re-turns when Q4 is issued against DBpedia The ellipsesin the displayed tables indicate records that have beenexcluded for brevity reasons

However this approach can still be improved uponFor instance the engine only considers the metadatadescriptions of the director whereas related movieshave to be retrieved anew upon execution of the post-filter section Thus they are not ranked according tosimilarity in the final output tableAnother weakness is the biased semantics of the query

Listing 6 SKOSRec query to generate simple post-filtered recommendations in the movie domain (Q4)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX yago lthttpdbpediaorgclassyagogt

SELECT director movieWHERE movie dbodirector director director rdftype yagoDirector110014939 LIMIT 50RECOMMEND director TOP 5PREF dbrQuentin_Tarantino

Table 5Result set of Q4

director moviedbrKirk_Douglas dbrScalawag_(film)dbrKirk_Douglas dbrPosse_(1975_film)dbrKevin_Costner dbrThe_Postman_(film)dbrKevin_Costner dbrDances_with_Wolves dbrGene_Kelly dbrHello_Dolly_(film)dbrGene_Kelly dbrInvitation_to_the_Dance_(film) dbrAlan_Alda dbrGoodbye_Farewell_and_AmendbrAlan_Alda dbrMargaretrsquo_s_Engagement dbrSteve_Buscemi dbrInterview_(2007_film)dbrSteve_Buscemi dbrAnimal_Factory

Usually when a person favors a movie director thispreference refers to the movies this director has shotand not to the personal features of the person Two di-rectors might be similar according to certain featuressuch as birth year nationality or received awards butthe style and genre of the movies they have created canstill be different A third factor is that recommenda-tion retrieval for movie directors should also take intoaccount that a film artist is likely more relevant themore movies she has shot that are in line with the pre-ferred movies of the user Hence the SKOSRec engineenables another postfiltering strategy which is calledaggregation-based retrieval This strategy can be help-ful whenever an entity type is linked to a couple ofLOD resources in a dataset such that similarity scoresof related entities (eg movies) can be aggregated toan overall recommendation score for a certain item(eg for a movie director)Aggregation-based requests can be well combinedwith other retrieval patterns such as preference queries

10 L Wenige et al Similarity-based Graph Queries

In the case of the movie recommendation scenario auser could state that she enjoyed Quentin Tarantinomovies in the past and would like to receive sugges-tions for directors that shot similar film Usually thiskind of question can typically only be answered byanother human being eg in a personal interactionamong friends or in an online forum The fact that thelanguage facilitates these kinds of queries is one of itskey advantages

Listing 7 SKOSRec query to generate post-filteredrecommendations based on a preference query in themovie domain (Q5)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX yago lthttpdbpediaorgclassyagogt

SELECT director movieWHERE movie dbodirector director director rdftype yagoDirector110014939 LIMIT 50AGG dbrQuentin_Tarantino director SUMRECOMMEND movie TOP 100PREF [ prefMovieWHERE prefMovie dbodirector dbrQuentin_Tarantino ]

Table 6 depicts the results of Q5 as retrieved fromDBpedia When comparing the solutions sets for Q4and Q5 it becomes clear that they are totally dif-ferent While Q5 generated a list of directors whohave mostly created action and thriller movies with atwist (ie Tarantinorsquos unique filmmaking style5) Q4provided a completely different output It shows USAmerican actor-directors whose movies range frommusical films (dbrHello_Dolly_(film)) towestern movies (dbrDances_with_Wolves) ordrama films (dbrInterview_(2007_film))Thus the directors in the solution table of Q4 (Table5) seem rather arbitrary and not exceptionally helpfulconcerning the initial user request

We define this kind of aggregation-based request asa so-called rollup query This definition was chosen be-cause the final suggestions are based on summarized(ie rolled up) preferences for sublevel entities An-other example for a roll-up request stems from the ap-

5httpsenwikipediaorgwikiQuentin_Tarantino

Table 6Result set of Q5

director moviedbrSteven_Spielberg dbrIndiana_JonesdbrSteven_Spielberg dbrJurrasic_Park_(film) dbrFrancis_Ford_Coppola dbrThe_Godfather_Part_IIdbrFrancis_Ford_Coppola dbrThe_Godfather_Part_III dbrChristopher_Nolan dbrThe_Prestige_(film)dbrChristopher_Nolan dbrThe_Dark_Knight dbrRobert_Rodriguez dbrPlanet_TerrordbrRobert_Rodriguez dbrMachete_(film) dbrKevin_Smith dbrClerksdbrKevin_Smith dbrClerks_II

plication scenario of travel search This recommenda-tion query derives suggestions for city trip destinationsfrom the points of interest (POI) a user has visited andliked in another city (eg London) Listing 8 showsthe corresponding SKOSRec query for this scenario

Listing 8 SKOSRec query to generate post-filteredrecommendations (aggregation-based) for a preferenceprofile containing the city of London (Q6)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX dullthttpwwwontologydesignpatternsorggtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX rdfslthttpwwww3org200001rdf-schemagt

SELECT cityWHERE poi property city city rdftype dboCity property rdfssubPropertyOf dulhasLocation LIMIT 3AGG dbrLondon city SUMRECOMMEND poi TOP 100PREF dbrOxford_StreetdbrNotting_HilldbrSouth_Bank

The SKOSRec request contains subquery com-mands that refer to the userrsquos preferred POIs It spec-ifies how similarity scores of the generated sugges-tions should be aggregated (ie sum-based) The re-trieval is based on the 100 POIs that are most simi-lar to the ones in the preference section Additionallythe subquery part refers to the DBpedia resource thatrepresents London (AGG dbrLondon city) thusindicating that the list of recommendations should notcontain any POIs in this city Upon processing the de-

L Wenige et al Similarity-based Graph Queries 11

picted query over the DBpedia repository the enginedisplays the three cities for whom the highest aggre-gated scores have been detected (see Listing 7)

Table 7Result set of Q6

citydbrOxforddbrAbingdon-on-ThamesdbrWallingford_Oxford

The motivation for formulating such a query is thatwhen searching for cities preferences might be betterrepresented through the sights a user has visited else-where rather than through a simple on-the-fly requestsolely containing a preference statement for a city inthe user profile sectionBesides the above presented roll-up retrieval patternsthe SKOSRec grammar may facilitate the formula-tion of additional types of advanced recommendationqueries For instance cross-domain queries are an in-teresting additional pattern to be explored for personal-ized retrieval Listing 9 shows an example request thatgenerates movie suggestions based on the preferencefor a music band (ie dbrThe_Jackson_5)

Listing 9 SKOSRec query to generate cross-domainrecommendations (Q7)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgt

SELECT movie participated musicalActWHERE movie rdftype dboFilm VALUES dbobasedOn dbomusicComposerdbogenre dbostarring musicalAct participated movie UNION movie participated musicalAct LIMIT 10AGG dbrThe_Jackson_5 movie MAXRECOMMEND musicalAct TOP 100PREF dbrThe_Jackson_5WHERE VALUES actType dboMusicalArtist dboBand musicalAct rdftype actType

The query contains a prefilter condition that trig-gers retrieval of music artists and bands Afterwardsthe scores of the top 100 suggestions are summarizedwith maximum-based aggregation Thus the highestsimilarity value among the set of music acts deter-

Table 8Result set of Q7

movie musicalActdbrMoonwalker dbrMichael_Jackson dbrThe_Wiz_(film) dbrAshford_amp_SimpsondbrGarfield_Gets_a_Life dbrThe_TemptationsdbrPipe_Dreams_(1976_films) dbrGladys_Knight_amp_the_PipsdbrThe_Jacksons dbrJermanine_Jackson

mines the ranking score of the film Also note be-cause of the aggregation-based retrieval procedureconnections between movies and the LOD resourcedbrThe_Jackson_5 are excluded from similaritycalculation The scenario exemplifies how semanticrelationships can help to generate suggestions for a tar-get domain (ie movies) that are based on a profilecontaining items from a different source domain (iemusic acts) While regular SPARQL queries performexact matching of triple statements the SKOSRec syn-tax facilitates the integration of similarity- and graph-based retrieval to enrich result lists with other poten-tially relevant itemsAs a point of reference consider the SPARQL queryfor the given cross-domain scenario (Listing 10)

Listing 10 SPARQL query to generate cross-domainrecommendations (Q8)

PREFIX dbr lthttpdbpediaorgresourcegt PREFIX dbo lthttpdbpediaorgontologygt PREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgt

SELECT movie musicalActWHERE movie rdftype dboFilm VALUES dbobasedOn dbomusicComposerdbogenre dbostarring dbrThe_Jackson_5 participated movie UNION movie participated dbrThe_Jackson_5 LIMIT 5

The query omits the similarity calculation partThus it only retrieves movies linked to the resourcedbrThe_Jackson_5 By this means the SPARQLresults have a better chance of being related to the ini-tial user profile On the other hand empty result listscan occur when the preferred LOD resource does nothave any direct connections to recommendable itemsThis assumption is backed up by the result lists of thetwo queries (Tables 8 and 9) The SKOSRec query pro-duces a more comprehensive solution than a regularSPARQL query

12 L Wenige et al Similarity-based Graph Queries

Table 9Result set of Q8

movie musicalActdbrThe_Jacksons dbrThe_Jackson_5

The following of definitions will formalize the post-filtered and aggregation-based retrieval forms Defini-tion 12 specifies the notion of SKOSRec recommenda-tions that may potentially undergo postprocessing

Definition 12 (SKOSRec recommendations) SKOS-Rec recommendations (R(Pr)) are the set of sugges-tions generated by processing the solution mappings(Ω) obtained from recommendation retrieval Theycan be a result of simple on-the-fly retrieval prefilter-ing or a result of a combination of these techniques Prrepresents the input profile that contains at least onepreference for an item The cardinality of R(Pr) is theintended number of recommendations (k) that is spec-ified by the user based on which the engine selects theresources with the highest recommendation scores thatare not contained in the profile (Eq 12)

R(Pr) = x | x isin q | score(Pr q) gt kmaxqisinΩ

score(Pr q) q isin Pr

(12)

The approach of subquerying with recommendationresults builds on the idea that SKOSRec suggestionsR(Pr) can be joined with any graph pattern Ppost pastto the process of similarity calculation (Definition 13)It does not make a difference how the engine obtainedthese recommendations

Definition 13 (Postfiltered recommendation mapping)The postfiltered recommendation mapping is obtainedby joining solution mappings generated from SKOS-Rec recommendations with the results of matching apostfilter graph pattern (Ωpost) (Eqs 13 and 14)

Ωpost f ilter = micro | micro isin [[Ppost]]D x isin dom(micro)

(13)

Ωpost = micro|x | x isin R(Pr) Ωpost f ilter (14)

As the prefilter Ppost can be comprised of triplestatements that are eligible in the RecGroupGraph-

Pattern part of the SKOSRec grammar Aggregation-based queries have to be treated differently than a reg-ular postfilter request Apart from excluding sublevelitems of the profile from the set of similar resourcesthe engine should also omit any connections betweenthe preferred superordinate item (a) and similar sub-level entities Additionally the user has to specify thevariable in his profile (eg y) that represents the itemswithin the postfilter graph pattern Ppost By this meansaggregation-based recommendations can be generated(Definition 14)

Definition 14 (Resources for aggregation-based re-trieval) The set of resources for aggregation-based re-trieval is obtained by retrieving LOD resources thathave a connection to a previously generated sugges-tion thereby excluding all resources that are linked tothe specified superordinate item a in the profile (Eq15)

Ragg(a) = micro(y) | micro isin Ωpost y isin dom(micro) a isin micro(y) (15)

For each potential aggregation-based suggestion (y)similarity scores of connected entities (x) have to beconsidered for the final ranking In this context differ-ent approaches to aggregation can be applied In caseswhen both similar items and aggregation-based recom-mendations are entities on comparable levels of gran-ularity then choosing the maximum similarity scoremight be the best way to represent the relatedness ofthe resource to the user profile On the other handwhen postfiltered recommendations are based on sim-ilarity scores of their sublevel entities (eg as in themovie recommendation example) the scores of relatedsuggestions should be aggregated by the sum or theaverage of individual scores

Definition 15 (Aggregation-based ranking score) Theranking score of an LOD resource y that is connectedto a set of recommended items is determined by aggre-gating scores of connected entities either by the max-imum (Eq 16) the sum (Eq 17) or the average (Eq18) of individual scores

scoreaggMAX(y) = maxxisinmicro(xy)

score(Pr x) (16)

scoreaggS UM(y) =sum

xisinmicro(xy)

score(Pr x) (17)

L Wenige et al Similarity-based Graph Queries 13

scoreaggAVG(y) =

sumxisinmicro(xy) score(Pr x)

|x isin micro(x y)|(18)

4 Evaluation of the SKOS Recommender

41 Experimental Setup

The novel recommendation strategies of the SKOS-Rec engine were evaluated in the context of web-basedexperiments for the usage scenarios of travel destina-tion search and multimedia RS The online approachhelped to reach out to more participants with diversedemographic backgrounds which would not have beenpossible in a laboratory experiment Another positiveside-effect of the web-based setting was that partici-pants could test and rate recommendations without be-ing directly observed by an experimenter which mighthave caused biased results otherwise Hence the on-line approach softened common limitations of labora-tory studies [19]The experiments were carried out in the defined usagescenarios with metadata descriptions from DBpediaThe database containing the local mirror of DBpediaran on a virtual server Besides the triple store a webapplication was hosted on the same machine The web-site ran on an Apache Tomcat 7 application server andwas implemented with the help of Java servlet technol-ogy67

We conducted the online experiments between April2016 and January 2017 Upon setting up the web ap-plication a pretest was carried out to control for po-tential pitfalls such as misleading navigation Two testusers ran step by step through the web interface andprovided their feedback Because of their insights weslightly modified the interface (eg through rephras-ing user instructions) to increase the chances of suc-cessful completion In case the SKOSRec engine is ap-plied in a live setting the design of the user interfacewill have to be tested with more users Our experimen-tal series focused on algorithmic performance whichwas tested with numerous participantsSubjects were recruited through the clickworkercomplatform8 Participants received a compensation fee

6httptomcatapacheorg7httpwwworaclecomtechnetworkjavajavaeeservletindexhtml8httpswwwclickworkerde

of 100e or 150e for a completed survey 103 par-ticipants took part in the study on travel destinationsearch In the multimedia domains in total 154 sub-jects evaluated the SKOSRec engine For each partof the evaluation it was ensured that participants hadnot just run through the evaluation screen Wheneverthere appeared to be irregularities in the log files (egthe default settings of the screen were not changed)these assessments were excluded from further analy-sis The survey template started with a consent formIt informed participants about the context and the pro-cedure of the study After having filled in the consentform participants were navigated to the first part of theexperiment It collected data on subjectsrsquo demograph-ics and their consumption and search habits After thepreparation phase participants worked on the first testcase (TC1) in each usage scenario In this test case theperformance of the SKOSRec engine was assessed inthe simple execution mode of on-the-fly recommenda-tions (see Subsect 32) It was done to gather data onthe usefulness of suggestions resulting from a simplecontent-based approach (ie baseline method) withwhich more advanced recommendation strategies werecompared at a later stage In cases where users did notstate any items the application showed an error pageWhile setting up a profile required some effort on theside of the user it was a necessary step for the experi-ment Since we did not have access to the log files of alive application profile generation had to be simulatedThe process was facilitated by a specifically tailoredinterface (see Fig 2) where subjects could search foritems through an autocomplete field that applied AJAXtechnology9

It enabled users to type in keywords (eg the nameof a music act or the author of a book) for whichmatchings were instantaneously displayed withoutreloading the site Hence participants could quicklyfind and select their items of interest Whenever auser entered a query through the AJAX interfacethe system matched the query keywords against anApache SolrLucene search index10 The index wasbuilt with item metadata which had been extractedfrom DBpedia prior to setting up the website Theindex comprised the names and abstracts of itemsWhenever available information on thumbnail pic-tures were saved in the index to be ready for display inthe AJAX interface In case the Apache SolrLucene

9httpapijquerycomjqueryajax10httpluceneapacheorgsolr

14 L Wenige et al Similarity-based Graph Queries

Fig 2 User profile generation TC1 (music domain)

search engine found index keywords that matched theuser query metadata information for these items wereimmediately shown on the screen This informationwas displayed to help participants make an informeddecision on whether the AJAX result list containedsuitable items for profile generationAside from the AJAX feature the appeal of the in-terface was increased by a progress bar at the top ofthe webpage (see Fig 2) Progress bars are commonlyutilized tools for web surveys In conventional paper-based studies subjects usually see how far ahead theyare while in a web survey they cannot check theirprogress immediately This is especially true for stud-ies that apply a screen-by-screen navigation whichwas the chosen approach for the conducted web ex-periments Progress bars prevent users from tiring ofthe questionnaire and withdrawing from the study alto-gether by showing how close participants are to com-plete the experiment [14]Upon profile generation the SKOSRec engine calcu-lated on-the-fly recommendations for a simple query(similar to Q1 Sect 3) which were shown to usersin a separate screen Suggestions were displayed witha sufficient amount of information eg thumbnailslabels or a short description to help participants as-sess the quality of the recommendation In addition tothis information users could also access the respectiveWikipedia article by following the hyperlink on the la-bel Relevance sliders were used to assess the utility ofeach recommendation They are often applied in eval-uations of IR systems [51 54] Scores were handled aspoints on a 0 to 100 scale of the relevance slider (outerleft side lowest relevance outer right side highest rel-evance see Fig 3)

This approach facilitated the calculation of an accu-rate precision score and the application of tests withhigher statistical power than an ordinal 5-star scaleThe score called mean relevance (mrs) was measuredwith Equation 19

mrs =

ksumi=1

scorei

k(19)

In addition to the precision score the web applica-tion also determined the size of the recommendationlist produced by the engine as an indicator for recallHowever recommendation list size can only be uti-lized for this purpose when mrs scores reach a reason-able levelWhile accurate recommendations are useful becausethey build trust in a system [57] they only capture anarrow aspect of performance [36] For instance anonline retailer can tremendously profit from an RSthat provides both relevant and unobvious recommen-dations since it enables users to explore the prod-uct catalog more thoroughly On the other hand inthe case of an RS only suggesting popular productsit is likely that the user already knows them [3 23]A pure accuracy-based evaluation disregards these as-pects which is why other performance indicators suchas novelty were measured as well Thus the evalua-tion interface also contained a section where subjectshad to tick a radio button that indicated whether anitem was new In the backend of the web applicationnovelty was measured as the ratio of relevant items not

L Wenige et al Similarity-based Graph Queries 15

Fig 3 Recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

familiar to the user (|nov|) among the total numberof relevant items (|tp|) (Eq 20)

nv =|nov||tp|

(20)

An item was considered relevant when it achievedan mrs score of 50 or higher In addition to novelty anddiversity the web application also tracked the diversityof recommendation lists since a topically diversifiedresult set has been found to increase overall user satis-faction [31 66] In the RS literature the most widelyexplored diversification metrics are the ones that mea-

sure the similarity between each item pair in the rec-ommendation list The more the items are alike theless diverse is the result set [19] The study series ofthis paper applied the metric by Castells et al [8] Itdetermines item-to-item similarity values with the co-sine similarity measure Scores are then converted todistance values (Eq 21) and averaged for the recom-mendation list (Eq 22)

d(il im) = 1minus cos(il im) (21)

16 L Wenige et al Similarity-based Graph Queries

divC =2

k(k minus 1)

sumlltm

d(il im) (22)

The content-based diversity scores (divC) were cal-culated in the backend of the web application andsaved in a log file for each generated recommenda-tion list of the SKOSRec engine during runtime execu-tion In addition to the measurement of divC scores weasked participants about their opinions on recommen-dation list diversity directly Following the RS evalua-tion framework by Pu et al [44] two Likert items wereposed for this purpose

ndash The recommendations of list [] are diversendash The recommendations of list [] are similar to

each other

Besides the above listed algorithmic performancemetrics the authors gathered assessments from usersregarding the overall usefulness of the suggestionsThis approach is also in line with the empirically veri-fied evaluation framework by Pu et al [44] which pro-poses to measure the psychometric construct of Per-ceived Usefulness to get a comprehensive understand-ing of user opinions on recommendation results Thusparticipants were instructed to state their agreement topositive statements regarding the relevance of the sug-gestions The statements were adapated to the speci-ficities of each usage scenario Fig 4 shows an exam-ple collection of Likert questions for the domain ofmusic RS

After participants had provided the required infor-mation their answers were saved in a log file and theywere navigated to test case 2 (TC2) which evaluatedthe SKOSRec enginersquos performance for constraint-based queries (similar to Q3 Sect 3) We tried to findout if the engine was able to improve user satisfactionthrough filter conditions Even though IR systems al-ready successfully apply facet filters on result sets theexperiments still had to prove whether LOD-enabledconstraints have the same effect on the Perceived Use-fulness of recommendations For the comparisons ofnon-filtered and filtered suggestions we applied awithin-subjects design in which participants had torate two recommendation lists resulting from regular(non-filtered) and filtered requests in TC2 We favoredthis setup over a between-subjects design where sub-jects would have been assigned to only one result listWhile the between-subjects design more closely re-sembles a real-world setting since it does not require

participants to interact with more than one approachat a time differences among subjects (eg regard-ing demographics or topic expertise) can potentiallyhave a considerable impact on quality judgments [19]Hence when applying a between-subjects design onedoes not know whether differences in performance arecaused by the approaches or by the variability amongparticipants Therefore researchers need to conductbetween-subjects experiments with a sufficient numberof study subjects to level out these differences Within-subjects studies on the other hand require fewer par-ticipants while still achieving the same level of sta-tistical power since between-subject variability is notan issue [30 31] Beel et al have shown that evensmall variations in an RS test setting could cause sub-stantial differences in performance outcomes [6] Tokeep the extent of variations even lower comparisonsbetween non-filter and constraint-based recommenda-tions were based on the same profile for each user inTC2 Therefore the user profile from TC1 served as astarting point for filtered retrieval The web applicationshowed the recently generated profiles to users andasked them to provide an additional filter conditionFig 5 depicts an example web form for a constraint-based recommendation request in the music domain Itcomprises the user profile and an additional text fieldwhere users stated their constraint

Afterwards the engine applied this filter conditionon the set of potential recommendation results be-fore similarity calculation started For each usage sce-nario different types of constraints were available Re-garding suitable filter options the author sought toachieve a balance between assisting users in finding fil-ter conditions that would adequately represent their in-formation needs while keeping the variability amongusers as low as possible Therefore the specificitiesof each scenario and the availability of the respectivedata sources were determined For instance for themultimedia domains (movie music and books) it wasassumed that users would like to filter recommenda-tions according to the specific genre of the item Fortu-nately the DBpedia dataset contains this kind of infor-mation for the music domain In this domain the genrecan be retrieved through the properties dbogenreHowever in the movie and book domains DBpediadoes not provide sufficient information Many LOD re-sources which represent books were not assigned agenre In the movie domain a genre property is miss-ing entirely and genre-specific information can of-ten only be found in the subject categories describ-ing the movie Hence in contrast to the experiment on

L Wenige et al Similarity-based Graph Queries 17

Fig 4 Evaluation screen for recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

music recommendations participants in the domainsof books and films did not filter their recommenda-tion lists by genre On the other hand the propertydcsubject was applied as recommendation filter ineach multimedia domain Since the subject propertyoften links to many informative features of multimediaitems (eg release period geographic information orcontent information) the author decided that this fea-ture represents a powerful filter dimensionIn the experiment on travel destination search the webinterface offered two filters a subject and a location-specific filter The subject filter enabled users to statetheir preferences regarding the characteristics of a lo-cation (eg being a nature reserve) The location-specific filter on the other hand gave participants theoption to specify where the desired destination shouldbe located11

The second fundamental research issue of TC2 con-cerned the question whether expressive constraints canboost recommendation quality even further Hence

11httpdbpediaorg

apart from executing a constraint-based workflow witha simple filter (ie a direct attribute of an item)the SKOSRec engine generated suggestions with thehelp of an expressive filter as well This second pro-cedure while still applying the same user constraintas the simple filter utilized an expressive graph pat-tern to include additional LOD resources in the resultset For instance in case a subject filter was selectedresult set expansion was facilitated by exploring thewider semantic space of the DBpedia category graphthrough a property path declaration in the graph pat-tern (eg skosbroader[2]) In the travel exper-iment an additional expressive filter retrieved place-related information (dulhasLocation) for location-specific subproperty relations that are not material-ized in DBpedia Hence the expressive graph pat-tern expanded the filter such that additional proper-ties were matched (eg dbocountry dbocity ordbostate) while still applying the specified userconstraint on the set of LOD resources Upon exe-cution of constraint-based recommendation requestsusers received two recommendation lists (resultingfrom simple and expressive filtering accordingly) in a

18 L Wenige et al Similarity-based Graph Queries

Fig 5 Web form for a contraint-based recommendation request (music domain)

separate evaluation screen Participants evaluated re-sult sets in the same manner as in TC1 For each resultset the screen contained an additional Likert item ask-ing subjects to give their agreement to the statementldquoThe filter has improved the recommendation resultsof listrdquo The sequence of suggestions resulting fromthe different methods was randomly assigned in eachuser session to avoid order effectsAfter study subjects had completed TC2 they werenavigated to test case 3 (TC3) In the TC3 section ofthe travel experiment participants could choose be-tween three options for rollup query patterns (simi-lar to Q6 Sect 3) Users were able to obtain recom-mendations for travel destinations based on POIs theyhad visited during the stay in another destination Theycould either select a city (option 1) a region (option 2)or a country (option 3) as destination type Upon en-tity type selection the SKOSRec engine generated ap-propriate suggestions Fig 6 shows the web form fromthe travel experiment that facilitated the formulation ofadvanced rollup requests in TC3

When users had selected an entity type stated theirfavorite travel destination and three belonging POIs

the application generated a SKOSRec query from theseparameters A simple on-the-fly request was sent tothe engine as well to retrieve baseline recommenda-tions with which the suggestions resulting from theadvanced request were compared at a later stage ofthe experiment The simple query only contained theuserrsquos favorite travel destination and the selected en-tity type option thus omitting information on the POIsDuring the travel experiment the engine processedboth the simple and the advanced query and producedtwo recommendation lists As in the previous parts ofthe experiment participants assessed the quality of theresult sets which appeared in random order on thescreenThe multimedia experiments also contained a sec-tion on rollup retrieval (TC3) As the advanced travelqueries the multimedia requests aggregated similarityscores of sublevel entities through summation whichthe engine joined with a postfilter In addition to thesefeatures the pattern contained a preference query partThis section identified a set of preferred items througha single user statement Participants either specifiedtheir favorite directoractor (music domain) music act

L Wenige et al Similarity-based Graph Queries 19

Fig 6 Travel - User profile generation TC3

(music domain) or author (book domain) and receivedrecommendations based on the features of the workscreated by the artist (similar to Q5 Sect 3) TheSKOSRec engine generated recommendations basedon the creative works of the favored artist Partici-pants also received suggestions from a simple on-the-fly query which computed results according to theartistrsquos characteristics As in the travel experiment rec-ommendations from the advanced rollup request werecompared with the on-the-fly query without letting par-ticipants know which approach they were assessingand through random assignment of result set positionsAfter participants of the multimedia experiments hadevaluated the results of TC3 they were guided to testcase 4 (TC4) which evaluated cross-domain queriesThe experiment on travel RS ended with TC3 becauseno suitable graph patterns could be identified to fa-cilitate these kinds of suggestions for the travel usagescenario In the multimedia domain however the datafrom DBpedia was sufficient for this retrieval task Theentity type of the study (ie movie music act or book)defined the source domain based on which the enginegenerated suggestions for items from another multime-dia target domain (similar to Q7 Sect 3) Fig 7 de-picts the cross-domain web form of the music experi-ment

The web form assisted subjects in formulating a re-quest which then obtained movie suggestions basedon the favorite music act of the user Parameters fromusers were entered in the placeholders of the querypattern in the backend of the application We formu-lated the query templates in such a way that the queryengine matched any properties or graph patterns thatcould potentially connect two items from the specified

target and source domains For instance such match-ings can occur when a user states his favorite moviethat is written by a particular author In case the sameauthor has also written some books they might be ofinterest to the user as well Another example stemsfrom the movie domain Suppose in hisher profile aconsumer has declared a preference for a movie thatlinks to the corresponding soundtrack It might wellbe the case that the user likes the soundtrack and themusic acts that were involved in creating it just asmuch as she likes the movie The practice to distin-guish homonymous LOD resources through the prop-erty dbowikiPageDisambiguates is another in-teresting linking pattern from DBpedia that was ex-ploited for cross-domain requests in TC4 since it con-nects similar entities (ie the book edition of a certainmovie)Cross-domain SPARQL queries (similar to Q8 Sect3) were posed as baseline requests in this test caseWhile the SPARQL query only matched cross-domainrelations for a single LOD resource the SKOSRec re-quest expanded the search space by considering linksfrom more than one item As in the previous test casesTC4 ended with an evaluation screen where users as-sessed the SPARQL-based as well as the SKOSRecsuggestions in random order without knowing whichrecommendation list belonged to which method Table10 gives an overview of the test cases and the meth-ods that were applied in the experiments in each usagescenario

42 Results

In total 257 people took part in the studyThe gath-ered samples from the experiments contained answers

20 L Wenige et al Similarity-based Graph Queries

Fig 7 Music - User profile generation TC4 (page 9)

Table 10Test Cases in the web experiments

Test Case Evaluation Purpose Query Type Travel Movie Music BookTC1 On-the-fly Recommendations (Baseline) Q1 X X X X

TC2 Constraint-based Recommendations Q3 X X X X

TC3 Advanced Queries (Rollup) Q5 Q6 X X X X

TC4 Advanced Queries (Cross-Domain) Q7 x X X X

from both genders with a slight overrepresentation ofmale participants (537) The prevailing number ofstudy subjects was between 20 and 40 years of age(658) The evaluation of the demographics sectionalso revealed that the majority of participants alreadyperceives the search in their domain of interest as ei-ther ldquomanageablerdquo (385) or even as ldquoeasyrdquo or ldquoveryeasyrdquo (440)Prior to analyzing the performance results of the novelretrieval approaches of the SKOSRec engine it waschecked whether the respective Likert items for Per-ceived Usefulness and user-based diversity (divU)reached acceptable reliability levels For this purposeCronbachrsquos α values were determined Table 11 showsthe outcome of this analysis for the Perceived Useful-ness construct Given the acceptable values through-out the domains (Cronbachsrsquo s α gt 07) it could betreated as an interval-scaled variable in the statisticaltests

Table 11Cronbachrsquos α values for the concept of Perceived Usefulness

Domain Cronbach αTravel 07002

Movie 08579

Music 08515

Book 08949

Besides usefulness user-based diversity scores werealso measured by multiple Likert items However

Cronbachrsquos α values for this concept did not reach ac-ceptable levels In order to avoid that valuable user as-sessments remained unused agreements to the state-ment bdquoThe recommendations of list [] are diverse ldquowere taken into account as an ordinal variable (divU)Based on these results subsequent tests could be car-ried out The first part of the analysis concerned theperformance of the on-the-fly retrieval method (base-line) of the SKOSRec engine in TC1 The outcomeof this analysis confirms that the on-the-fly approachrepresents a viable recommendation strategy Table 12shows the mean (M)12 standard deviation (S D) andnumber of participants (N) for each performance di-mension and metric It can be seen that users assessedthe enginersquos performance in almost each quality di-mension to be above mediocrity Aside from the posi-tive user evaluations with regard to Perceived Uselful-ness related quality aspects such as Accuracy Noveltyand Diversity received good assessments as well Forinstance each domain mean relevance score (mrs) washigher than 50 score points It means that participantswere often convinced of the relevance of the resultsGiven these positive user assessments with regard toitem relevance it seems to be justified to consider themean of result list sizes as an indicator of recall Theengine was able to generate the required number ofrecommendations in almost each test case (see Table

12In case of the user-based diversity score (divU) M denotes themedian because of the ordinal scale of the variable

L Wenige et al Similarity-based Graph Queries 21

12) The on-the-fly retrieval approach was also testedwith regard to the Novelty dimension Table 12 showsthe mean novelty scores (nv) for the baseline recom-mendation approach Novelty scores indicate that themajority of items in a result list were deemed to be newin the travel and book RS whereas in the experimentson movie and music RS only one third of the recom-mendation list was unfamiliar to users In contrast tothat content-based Diversity scores (divC) were ratherhigh in each domain Hence it can be concluded thatitems in a result set usually have a low rate of match-ing SKOS annotations In addition to content-basedscores user assessments of recommendation list Di-versity (divU) were also taken into account The me-dian level of agreement to the Diversity-related Lik-ert item stating that recommendations are diverse liedat 3 (ldquoneutralrdquo) in the multimedia domains and at 4(ldquoagreerdquo) in the travel domainIn summary the baseline approach achieved fairlygood performance results in TC1 The findings demon-strate that LOD-enabled recommendations can po-tentially enhance a userrsquos search experience Theyalso allow for subsequent tests of advanced retrievaltechniques which can be compared to this standardmethod

In the analysis of performance results from TC2(constraint-based recommendations) particular atten-tion was paid to participantsrsquo agreements with thestatement ldquoThe filter has improved the recommenda-tion results of the listrdquo Fig 8 shows the distribution ofdomain-wise agreement ratios On average they werefairly high Throughout the domains the vast majorityof participants either selected 5 (ldquostrongly agreerdquo) 4(ldquoagreerdquo) or 3 (ldquoneutralrdquo) Hence a filter often had apositive impact on results

The authors gathered responses to the questionon improvement for recommendation lists resultingfrom both the simple and the expressive filtering ap-proaches Hence they give clues about the generalperformance of constraint-based retrieval Howeversince this section of the study revealed the experimen-tal condition to participants (ie application of a fil-ter) answers to the Likert statement have to be inter-preted cautiously Results may be slightly biased asparticipants were able to guess the underlying agendaHowever these limitations do not exist for compar-isons between simple and expressive filters Since theweb application randomly assigned the list order par-ticipants did not know which approach they were as-sessing This enables us to take a closer look at the dif-ferences between the two filtering approaches It was

Table 12Performance results in TC1 (M = Mean S D = Standard DeviationN = No of participants)

Dimension (Metric)[Max Score]

Domain Results

M SD N

Accuracy (size) [10]

Travel 1000 000 103Movie 1000 000 50Music 1000 000 53Book 980 140 51

Accuracy (mrs) [1]

Travel 5649 000 103Movie 6225 1915 50Music 5570 010 53Book 5747 1555 50

Novelty (nv) [1]

Travel 060 034 101Movie 036 031 50Music 036 039 52Book 061 029 47

Diversity (divU) [5]

Travel 4 - 102Movie 3 - 50Music 3 - 53Book 3 - 50

Diversity (divC) [1]

Travel 080 016 103Movie 087 012 50Music 086 010 50Book 092 008 50

Usefulness [5]

Travel 334 085 103Movie 355 073 50Music 318 085 53Book 343 081 50

hypothesized that advanced filters improve recommen-dation quality as they may identify more relevant re-sources thereby overcoming problems of data qualityor data sparsity in LOD repositories Table 13 lists re-sponse rates of the two methods The rate measuresthe ratio of non-zero result sets among all result setsThroughout the domains recommendation lists result-ing from expressive filtering achieved higher responserates

Table 13Response Rates (TC2)

Domain Constr-based(Regular)

Constr-based(Expanded)

N

Travel 23 57 103

Movie 86 88 50

Music 72 75 53

Book 73 76 51

22 L Wenige et al Similarity-based Graph Queries

000

025

050

075

100

Digital Library Travel Movie Music Book

Domain

Per

cent

age

of A

gree

men

t

Degree of Agreement

1

2

3

4

5

The filter has improved the recommendations of the list

Fig 8 Participantsrsquo agreement with the statement that the filter has improved the result list (domain-wise) (TC2)

In addition to response rates user assessments andscores in the defined quality dimensions were com-pared Table 14 depicts the outcome of this analysisSignificantly higher scores are marked in bold figuresIn case statistical tests identified no meaningful varia-tions the results of the better performing approach areunderlined A-levels were adjusted with the false dis-covery rate (FDR) in order to decrease the probabilityof making a type I error due to multiple testing

According to mean scores expressive requestsgenerated more comprehensive recommendation lists(size) This finding is in line with the increased re-sponse rates shown in Table 13 However subse-quently conducted t-tests confirmed significant dif-ferences only for the travel experiment (t(102) =minus789 p lt 0001) In contrast precision scores werehigher in most of the domains (travel movie mu-sic) when the SKOSRec engine processed a simpleconstraint-based query However statistical tests didnot confirm any systematic differences The authorsapplied Wilcoxon tests for mrs scores and the remain-ing performance metrics because of the small samplesizes resulting from the low response rates of the sim-ple constraint-based approachRegarding Novelty none of the approaches was supe-rior In the Diversity dimension expressive filters gen-erated suggestions that were at least as topically di-versified as the results from simple constraint-basedretrieval Even though no significant differences were

identified between the two approaches increased meanscores (divU in the travel domain and divC in each do-main) indicate a slight superiority of expanded filterrequests in this quality dimension The same appliesto usefulness scores (mean values were higher in 3 outof 4 domains when expressive filtering was applied)However these statements are speculative at best be-cause statistical tests were not significantIn summary it can be concluded that expressive fil-ters improve recall potentially leading to a slight lossin precision scores (see Table 14) It may also be thecase that expanded user constraints diversify as wellas increase the usefulness of recommendation listsHowever these claims are unverified In contrast itis safe to say that expressive query patterns generateresults of at least the same quality as simple patternswhile increasing the number of relevant suggestionsThe few differences may be explained by the high Jac-card scores (JI) (see Table 15) of recommendation listsThroughout the domains result sets were much alikeGiven these similarities and the increases in list sizesfor expressive filters it is supposed that an expandedconstraint produces an enhanced version of the recom-mendation list resulting from simple filtering There-fore expanded filters should be the default setting in aconstraint-based retrieval context

In TC3 (ie evaluation of roll-up query patterns)the SKOSRec engine almost always generated non-empty recommendation lists (see Table 16) In the

L Wenige et al Similarity-based Graph Queries 23

Table 14Performance results in TC2 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Simple Expressive N

M SD M SD

Accuracy (size) [10]

Travel 117 270 452 462 103Movie 672 408 720 392 50Music 596 464 636 451 53Book 459 450 500 451 51

Accuracy (mrs) [1]

Travel 6673 2051 6534 2236 23Movie 6095 2298 6029 2279 43Music 6377 1605 6296 1644 34Book 5617 1818 5681 1681 37

Novelty (nv) [1]

Travel 048 041 061 049 21Movie 044 036 055 080 42Music 055 039 052 037 38Book 077 034 076 033 33

Diversity (divU) [5]

Travel 3 - 35 - 24Movie 4 - 4 - 43Music 4 - 4 - 38Book 3 - 3 - 37

Diversity (divC) [1]

Travel 068 018 072 021 19Movie 070 024 073 023 41Music 086 013 087 012 34Book 084 024 095 063 29

Usefulness [5]

Travel 333 093 350 077 24Movie 332 085 331 086 43Music 355 081 364 078 38Book 339 083 362 121 37

Table 15Jaccard Indices (TC2)

Domain Mean Jaccard Index (JI) SD N

Travel 057 047 25

Movie 071 038 43

Music 082 035 37

Book 092 026 35

travel and the book experiment the response rate washigher for aggregation-based queries whereas in themovie domain the engine more often provided recom-mendations for regular requests However these dif-ferences are only marginal and do not indicate a clearsuperiority of one method

Fig 9 depicts participants general agreement to Lik-ert statements concerning the Perceived Usefulness ofthe approaches in TC3 The diagram shows similarresults for regular as well as advanced requests Onaverage satisfaction scores reached levels which layslightly above neutral agreement A subsequent t-test

Table 16Response Rates (TC3)

Domain Regular Aggr-based N

Travel 99 100 103

Movie 96 92 50

Music 100 100 53

Book 96 100 51

confirmed no significant differences between the twomethods In fact the approach with the highest aver-age score of Perceived Usefulness was different in eachexperiment While mean values were higher for regu-lar requests in the studies on travel and music RS ad-vanced queries produced higher scores in the movieand book experiments

Content-based Diversity (divC) was the only met-ric with systematic differences Here scores went upfor advanced retrieval patterns in the travel (t(98) =minus550 p lt 0001) the music (t(52) = minus451 p lt

0001) and the book domain (t(47) = minus490 p lt

24 L Wenige et al Similarity-based Graph Queries

0

1

2

3

Travel Movie Music Book

Domain

Agr

eem

ent

Method

Regular

Rollminusup

Perceived Usefulness (Regular vs Rollminusup Queries)

Fig 9 Participantrsquos overall satisfaction with regular and roll-up recommendation retrieval (TC3)

0001) (see Table 17) This finding is not surprisinggiven that the aggregation-based approach explores thewider semantic network of each item in the profile

Hence it can be assumed that both approaches pro-duce recommendation lists of similar quality and canprovide benefits to potential consumers Thus it can-not be clearly stated which approach is superior be-cause neither of them outperformed the other in eachdomain While the regular retrieval strategy was bet-ter in terms of usefulness in the travel and the mu-sic domains the aggregation-based approach achievedhigher scores in the domains of movie and book rec-ommendationsThis conclusion is especially impressive given thelow mean Jaccard scores for recommendation lists re-sulting from regular and aggregation-based retrieval(see Table 18) The mean score never exceeded 01

throughout the domains It indicates a low concor-dance between result sets It is remarkable that par-ticipants perceived the recommendations as equallygood given the high dissimilarity of the lists Thusadvanced queries have an added-value as they provideadditional interesting results They can help to exploreLOD repositories in different ways than a regular re-trieval approach Therefore it is worthwhile to offeraggregation-based query patterns as an alternative re-trieval strategy when the regular approach has not pro-duced any helpful recommendations

In addition to TC3 user assessments for TC4 (cross-domain recommendations) were also evaluated How-ever since the authors conducted experiments for TC4only in the multimedia domains the samples were con-siderably smaller Nevertheless it is assumed that theamount of completed user sessions is only just enoughto conduct further statistical analyses Table 19 showsthe response rates for the two retrieval methods Cross-domain patterns had a higher success rate of generat-ing non-zero result sets throughout the domains Par-ticipants almost always received a recommendation forthis query type whereas in case a regular SPARQLquery was executed often they did not receive a singlesuggestion at all

We applied non-parametric tests for the perfor-mance metrics mrs nv divU divC and PerceivedUsefulness due to the low response rates of theSPARQL-based approach which led to a low num-ber of comparable data points accordingly Overall theFDR-adjusted pairwise Wilcoxon tests detected no sig-nificant differences except for user-based Diversityscores (divU) in the movie (Z = minus269 p lt 005) andthe book domain (Z = minus256 p lt 005) The resultsfor the content-based Diversity (divC) metric seem topoint in the same direction because of the high meanvalues for cross-domain queries in each domain Butthe significance of these results was not verified Themrs scores indicate a better performance of regular

L Wenige et al Similarity-based Graph Queries 25

Table 17Performance results in TC3 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular Aggr-based N

M SD M SD

Accuracy (size) [10]

Travel 990 099 942 207 103Movie 288 059 276 082 50Music 1000 000 1000 000 53Book 288 059 300 000 51

Accuracy (mrs) [1]

Travel 4821 2498 4493 2559 102Movie 5458 2288 5669 2542 44Music 5395 1831 5824 1619 52Book 5153 2390 4925 2230 49

Novelty (nv) [1]

Travel 043 039 038 040 98Movie 035 038 028 041 41Music 049 036 046 039 50Book 061 042 062 044 47

Diversity (divU) [5]

Travel 35 - 3 - 100Movie 4 - 3 - 43Music 4 - 3 - 50Book 3 - 3 - 48

Diversity (divC) [1]

Travel 079 017 089 013 99Movie 075 020 066 027 44Music 083 017 093 007 53Book 074 027 093 006 48

Usefulness [5]

Travel 336 077 322 074 101Movie 320 092 326 094 44Music 345 084 344 085 51Book 295 087 303 096 48

Table 18Jaccard Indices (TC3)

Domain Mean Jaccard Index (JI) SD N

Travel 007 011 101

Movie 003 007 44

Music 002 005 53

Book 002 006 50

Table 19Response rates (TC4)

Domain Regular (SPARQL) SKOSRec N

Movie 32 96 50

Music 62 98 53

Book 53 100 51

SPARQL queries but the within-subjects test did notconfirm this assumption statistically (see Table 20)

However the most important finding of the anal-ysis was that SKOSRec cross-domain queries gener-ated significantly more suggestions throughout the do-

mains (movie t(49) = minus1792 p lt 0001 mu-sic t(52) = minus2573 p lt 0001 book t(50) =minus1275 p lt 0001) (see Table 20) Fig 10 illustratesthis graphically

Whenever the SPARQL request provided recom-mendations which was the case in less than half ofthe user sessions (see Table 19) the SKOSRec sug-gestions were assessed to be of almost equal qualityas the SPARQL recommendations (see Table 20 Use-fulness) Hence when SPARQL querying did not pro-duce any results in more than the other half of the testcases the SKOSRec approach may still have provideduseful suggestions

Therefore it is reasonable to assume that the capa-bility of the SKOSRec engine to flexibly switch be-tween processing steps of similarity calculation andpattern matching can facilitate an improved explo-ration of LOD repositories Table 21 also shows thatthe items in the two sets often did not match as isindicated by low Jaccard indices This finding furtherdemonstrates that the execution of an additional step

26 L Wenige et al Similarity-based Graph Queries

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

00

25

50

75

100

Movie Music Book

Domain

No

of i

tem

s

Method

Regular

CrossminusDomain

Recommendation List Size

Fig 10 Mean recommendation list sizes for SPARQL and SKOSRec cross-domain requests (TC4)

L Wenige et al Similarity-based Graph Queries 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

of similarity calculation can help to retrieve other rel-evant items

5 Future Work amp Conclusion

This paper has presented the query-based SKOS-Rec engine that facilitates new types of recommen-dation requests for LOD repositories The evaluationshave shown that the SKOSRec system can often gener-ate relevant and useful suggestions when certain querytemplates are utilized In many cases the applicationof expressive or advanced query patterns (ie in thecontext of constraint-based retrieval or cross-domainrequests) helped to significantly improve recall valueswhile still providing the same level of quality in theremaining performance dimensions Additionally ad-vanced recommendation requests can be used to gener-ate diversified recommendation lists in case users arenot satisfied with the results of regular recommenda-tion requests (ie roll-up or cross-domain retrieval vsregular requests) While not being superior in each do-main and test case is at least safe to say that the novelrecommendation approaches of the SKOSRec engineconsiderably extend common retrieval methods and atleast provide an alternative search strategy when con-ventional methods fail to provide useful results Addi-tionally the increase in diversity and recall is in linewith findings from previous research on LOD-enabledRS [45]It is a strength of the developed approaches that theyfacilitate combinations of graph-based and similarity-based retrieval at different stages of the recommenda-tion workflow This feature extends general search ca-pabilities for semantic networks It requires future re-search to investigate whether in addition to SKOS fur-ther RDF vocabularies can be used for ad-hoc item-to-item similarity computation to increase the range ofpotential usage scenarios since not every LOD datasetcontains SKOS annotations [53] The application ofadditional properties could also boost the recommen-dation quality of some query patterns (eg such as thequery types used in TC3) In this context it also needsto be determined whether the representation of the user

profile in the SKOSRec query language can be substi-tuted with free-text expressions to enable search-likefunctionalities LOD researchers have already devel-oped engines that can process natural language queriesover RDF data [59] [34] [67] It will have to be inves-tigated how the SKOSRec engine profitably fits intothis landscape of existing retrieval toolsAnother open research question concerns the aspect ofinterfaces to other applications In its standard config-uration the system is not an end-user retrieval enginebut a backend application with which an administra-tor who has domain knowledge (eg of RDF vocab-ularies and data models of a particular LOD reposi-tory and usage scenario) creates the respective querypatterns Although the websites of the online experi-ments represent possible implementations of suitableend-user applications for the SKOSRec engine thereare still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selec-tion of LOD items for the profile was made possibleby previously storing the data in an index that couldthen be accessed and searched before issuing a recom-mendation query However this runs against the ideaof ad-hoc retrieval In this context future research willhave to determine whether it is feasible to generate anon-the-fly mapping from items to the respective LODresources In the case of a commercial application theengine needs to match items with the correspondingentries in the product catalog Even if it is not yet clearwhich UI components can make the best possible useof the SKOSRec enginersquos features the results from theuser experiments can give at least a few suggestionsThe following retrieval options should be the defaultsetting in an interface that contains a LOD-enabledRS component to assist users in finding interestingitems Since the regular approach of on-the-fly re-trieval achieved good results throughout the domainsthe display of simple recommendations generated frompreviously stated user preferences represents a suitablestarting point for retrieval On-the-fly queries could berefined by providing filter options Since the evalua-tions have shown that expressive constraints often leadto increased recall values the application of such afilter might be the best option for assisted retrievalAs in the preparations for the experiments appro-priate graph-based query patterns for the usage sce-nario in question would also have to be determinedby a domain expert before setting up the applicationThe same applies to advanced retrieval patterns whichshould be available to the end user to refine the querywhen both the simple as well as the constraint-based

28 L Wenige et al Similarity-based Graph Queries

approach have failed to provide relevant items Thissuggestion is made because roll-up queries were com-petitive with regular recommendations in the web-based experiments Therefore they represent a viablealternative retrieval strategy For some domains (egfor multimedia retrieval) it is also possible to usean optional selection field to enable the formulationof cross-domain requests Evaluations for this querytype suggest that users might receive interesting results(ie diversified and comprehensive recommendationlists) from such a requestIn summary the contributions of the paper are as fol-lows

ndash Development and implementation of novel re-trieval strategies in the SKOSRec system whichfacilitate personalized requests and leverage thepotential of knowledge graphs in order to addressthe research gap in state-of-the-art recommenda-tion methods with regard to customized queries

ndash Realization of web-based user experiments infour usage scenarios with the following out-comes

bull Successful demonstration that knowledge graphdata is applicable in the context of live recom-mendation scenarios

bull Creation of suitable query patterns that canbe applied to represent individual informationneeds in different application contexts The setof query patterns can be extended with the helpof the SKOSRec query language to meet therequirements of additional usage scenarios

bull Proof that the novel retrieval strategies can besuccessfully applied across various applicationcontexts and recommendation tasks and thatthey outperform conventional content-basedretrieval approaches especially in the perfor-mance dimensions of diversity and recall

The prototypical implementation and the evaluationof the SKOSRec engine in different usage scenarioshave demonstrated that the freely available knowledgesources on the LOD cloud can be successfully appliedto advance state-of-the-art methods in IR and recom-mender systems The proposed personalized and cus-tomized retrieval strategies can leverage the potentialof knowledge graphs by better matching informationneeds of users with the help of semantic search op-tions

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be com-prised of up to six elements of which the partsSimProjection and ItemPart are obligatory Henceusers need to state at least their preferencesand how many recommendations they wouldlike to receive Advanced retrieval patterns canbe formulated as well For instance similar re-source retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (Se-lectPart) of LOD resources As in SPARQL thequery can start with a prologue that abbreviatesIRIs with namespace declarations [21]

SelectPart The SelectPart part enables subqueryingwith recommendation results The surroundingquery is a SELECT query with a slightly simpli-fied WHERE clause (ie RecWhereClause) withwhich postfilter conditions can be formulatedIn this section users have the option to formu-late aggregation-based retrieval requests (Aggre-gation)

Aggregation This query part handles aggregation-based postfiltering Users specify the IRI (IRIref )resource for which similarity scores of sublevelentities are summarized Additionally it is statedhow the data should be aggregated (ie SUMMAX or AVG) as well as which variable (Var)represents the aggregation-based recommenda-tions in the postfilter section The compiler makessure that this variable is actually contained in theRecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refersto the list of generated suggestions It definesthe recommendation variable to which all simi-lar LOD resources are mapped In case pre- andpostfilter conditions are set in the other sectionsthe compiler checks whether the variable (Var)appears in these sections as well Otherwise therequired join operations cannot be executed Itis specified how many recommendations shouldbe displayed (Integer) and whether the requestwill be issued against a default repository or as across-repository request (ServiceIntegration)

ServiceIntegration This section indicates that a userintends to receive recommendations from a dif-ferent SPARQL endpoint than the default end-point that is stated in the standard configurationof the SKOSRec engine It specifies the targetSPARQL endpoint (IRIref ) from which a user

L Wenige et al Similarity-based Graph Queries 29

wants to receive recommendations Upon extrac-tion of SKOS annotations from the default sourceendpoint a subsequent request is sent to the spec-ified target endpoint

ItemPart This section of a SKOSRec query repre-sents a single preference in the user profile How-ever recommendation queries can contain a cou-ple of preference statements A preference is ei-ther expressed as a LOD resource (IRIref ) or asa variable which is bound to a preference query(VarPart) In the latter case the resources areobtained by matching the graph pattern in theVarPart with the triple statements in the reposi-tory The ItemPart statement can be additionallysupplemented with a concept-to-concept similar-ity threshold (Sim) that triggers concept expan-sion with the approach of flexible similarity de-tection which is extensively described in one ofour previous works [64]

VarPart The construct initiates preference queryingIndividual likings are expressed as a graph pat-tern in a RecWhereClause The variable (Var) isa placeholder for the LOD resources that will beused to set up the user profile The compiler hasto make sure that the specified variable occurs inthe RecWhereClause as well Otherwise the ex-traction of LOD resources cannot be carried outcorrectly

Sim The Sim part of a SKOSRec query denotes thethreshold of concept-to-concept scores for flexi-ble similarity detection Even though knowledge-based similarity values usually range between 0and 1 the specification allows a broader range ofdigits in the Decimal datatype to enable applica-tion of other similarity metrics in case they areneeded

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users canstate whether scores should be ldquolargerrdquo ldquolargerthanrdquo or ldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three dif-ferent sections of a SKOSRec query ie either inthe prefilter (ItemPart) the postfilter (SelectPart)or the preference filter section (VarPart) TheRecWhereClause closely resembles the ldquoWhere-Clauserdquo of the SPARQL 11 specification withall its subsequent parts [21] Hence it enablesdifferent combinations of graph pattern match-ing However the RecWhereClause of the SKOS-Rec language allows fewer pattern matching ex-pressions than the ldquoWhereClauserdquo of the regular

SPARQL syntax specification [21] For instanceit is neither permitted to formulate subqueries norto direct filter requests to more than one reposi-tory (see RecGraphPatternNotTriples for furtherexplanations) This modification has been madeto simplify similarity calculation After parsingthis section the compiler makes sure that recom-mendation preference or postfilter variables oc-cur at least once in the RecWhereClause to guar-antee that subsequent steps of the recommenda-tion workflow can be applied on existing LOD re-sources In case the RecWhereClause comprises aRecMinusGraphPattern it also has to be checkedthat the variable is not only contained in theMINUS part of the group

RecGroupGraphPattern A RecGroupGraphPatterncan contain one to many basic graph patternswhich can be differently combined according toRecGroupGraphPatternSub

RecGroupGraphPatternSub This section defines thegraph patterns that are applied to identify suit-able LOD resources User filters can be either ex-pressed as basic graph patterns (TriplesBlock) oras combinations of them (RecGraphPatternNot-Triples) The sequence of different kinds of pat-terns can be flexibly specified

RecGraphPatternNotTriples This query part closelyresembles the similar named ldquoGraphPatternNot-Triplesrdquo of the SPARQL 11 specification [21]The only difference to SPARQL is that the Rec-GraphPatternNotTriples section is a little morerestricted in terms of admissible graph patternsthan the ldquoGraphPatternNotTriplesrdquo section of theregular SPARQL specification For instance itdoes not allow to retrieve LOD resources otherthan from the default graph that is specified in theconfiguration of the SKOSRec engine since filterpatterns have to be applied to datasets that containSKOS annotations Hence they need to be speci-fied before runtime execution to ensure that sim-ilarity calculation can be carried out correctly Ifa user was able to formulate filter conditions thatreferred to different RDF graphs and endpointsaccordingly this would not be possible

RecGroupOrUnionGraphPattern This query sec-tion marks an alternative graph pattern in the styleof the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntaxspecification [21] However it neither allowsSPARQL-like subqueries nor federated queries

30 L Wenige et al Similarity-based Graph Queries

for filtered resources because this would compli-cate similarity calculation

RecOptionalGraphPattern In this part of a SKOS-RecQuery users are enabled to specify additionalgraph patterns which might extend the query so-lution but do not necessarily have to match thedata

RecMinusGraphPattern This section handles exclu-sion of LOD resources for cases when users wantto omit certain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

L Wenige et al Similarity-based Graph Queries 31

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM A

mobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige J Ruhland Flexible on-the-fly recommendationsfrom Linked Open Data repositories In International Confer-ence on Business Information Systems pp 43ndash54 2016

[63] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[64] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[65] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[66] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[67] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Future Work amp Conclusion
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 5: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge … · Semantic Web 0 (0) 1 1 IOS Press Similarity-based Knowledge Graph Queries for Recommendation Retrieval Lisa Wenige*,

L Wenige et al Similarity-based Graph Queries 5

Fig 1 Architecture and workflow of the SKOSRec engine

9 Ranking of LOD resources10 Sending of recommendation results to the com-

piler and joining them with postfilter constraints11 Retrieval of the final result set (see Subsect 35

Postfiltering amp Combinations)12 Output to the user

32 On-the-Fly Recommendations

A simple on-the-fly request is the most basic form ofa SKOSRec query It is based on the enginersquos ability togenerate ad-hoc recommendations from a user profilecontaining preference statements for LOD resourcesListing 2 depicts an example SKOSRec on-the-fly re-quest that obtains suggestions based on a userrsquos pref-erence for the western movie ldquoThey Call Me Trin-ityrdquo In the given query the preferred movie is rep-resented by the DBpedia resource (dbrThey_Call_Me_Trinity) The shown query also contains a prefixwith a namespace declaration and a specification re-garding the number of suggestions the engine shouldgenerate

Listing 2 A simple recommendation query for themovie domain (Q1)

PREFIX dbr lthttpdbpediaorgresourcegt

RECOMMEND movie TOP 5PREF dbrThey_Call_Me_Trinity

In this basic form the query does neither con-tain any pre- or postfilter conditions nor a preferencequery statement Thus the engine simply identifies allmovies that are similar to the preference in the profileWhen executed over DBpedia the query returns thesolution set that is depicted in Table 2

Table 2Result set for Q1

moviedbrGod_Forgivesrsquo_I_DonrsquotdbrAce_High_(1968_film)dbrBoot_Hill_(film)dbrTrinity_Is_Still_My_NamedbrTroublemakers_(1994_film)

For determining recommendations in an ad-hocfashion the system identifies SKOS annotations bymatching a suitable property (ie annotation property)in an RDF dataset (Definition 1)

6 L Wenige et al Similarity-based Graph Queries

Definition 1 (Annotation property) An annotationproperty is an IRI (Listing 3) that is defined in theDublin Core Metadata Initiative (DCMI) specificationfor subject annotations It can occur in conjunctionwith one of the two namespaces which are stated bythe standard [15]

Listing 3 Annotation property

ltANNOTPROPgt = dcsubject | dctsubject

These properties help to retrieve SKOS annotationtriples for items preferred by a user (Definition 2)

Definition 2 (SKOS annotation triple) A SKOS anno-tation triple (st) is a triple that is comprised of an IRIin the subject position an annotation property (AN-NOTPROP) as predicate and a concept c in the objectposition The concept is part of a knowledge organiza-tion system S in SKOS format (C = c | c isin S ) (Eq1)

st isin I times ltANNOTPROPgttimesC (1)

The identification of SKOS annotation triples facil-itates a fast retrieval of potential similar items There-fore the engine evaluates shared features between re-sources from the SKOS annotation dataset (Definition3)

Definition 3 (SKOS annotation dataset) A SKOS an-notation dataset (AD) is a subset of an RDF datasetthat contains SKOS annotation triples (Eq 2)

AD sub I times ltANNOTPROPgttimes C (2)

Before the similarity calculation process starts theengine determines SKOS annotations for each LODresource (r) from the user profile by issuing a sub-sequent SPARQL query to the LOD repository fromwhich the user intends to receive suggestions The no-tion of SKOS annotations is specified in Definition 4

Definition 4 (SKOS annotations) In the annotationdataset LOD resources directly link to concepts ofa SKOS system via a predefined annotation propertySKOS annotations for an input resource r are definedas in Equation 3

Annot(r) = c isin S | exist(rltANNOTPROPgt c) isin AD (3)

Afterwards the engine identifies which of the otherresources in the dataset shares annotations with itSimilarities only have to be computed for items withmatching concepts [13 52] Therefore triples canbe efficiently joined along item features since nativeRDF storage systems usually index triples rather thancolumns [41] Thus the quick identification of mu-tual annotations through SPARQL requests facilitatesad-hoc similarity calculation Definition 5 specifies thenotion of relevant resources and their annotations thatare needed to identify similar items The applied syn-tax follows Peacuterez et al [42]

Definition 5 (Relevant resources and their annota-tions) The conjunctive graph pattern Pr matches allitems and subjects that are potentially relevant for rec-ommendation retrieval (Eq 4)

Pr = (rltANNOTPROPgt c) AND (qltANNOTPROPgt c)

(4)

The mapping Ωr of relevant resources and theirSKOS annotations is obtained by retrieving all re-sources q that share at least one SKOS concept c withresource r from AD (Eq 5) The corresponding resultset contains mappings for all variables (ie c and q)in the triple statements of Pr

Ωr = [[Pr]]AD (5)

Upon extraction of relevant resources and annota-tions the process of similarity calculation can startThe engine determines the information content (IC) ofthe shared SKOS annotations of two resources Thisapproach is based on the hybrid similarity metric pro-posed by Meymandpour and Davis [37] for LOD-enabled RS However while their measure considersall types of IRI resources for the retrieval our similar-ity metric purely relies on SKOS annotations (Defini-tion 6)

Definition 6 (SKOS similarity) Let Annot(r) be theset of SKOS features of resource r and Annot(q) the setof SKOS features of resource q and q isin micro(q) | micro isinΩr then their similarity can be derived from the IC oftheir shared concepts Cshared = Annot(r) cap Annot(q)

sim(r q) = IC(Cshared) (6)

L Wenige et al Similarity-based Graph Queries 7

The IC of a set of SKOS concepts is measured bythe sum of the inverse logarithms of each conceptrsquos fre-quency in relation to the maximum frequency of oc-currences among relevant resources (Definition 7)

Definition 7 (SKOS Information Content) The SKOSIC is defined as an aggregation of the individual ICvalues of each concept that is contained in the setof shared features (c isin Cshared) where f req(c) isthe frequency of occurrences of c among all rele-vant resources and n is the maximum frequency of oc-currences among these resources The final similarityscore of two resources r and q is determined by sum-mating the IC values of each concept of the shared fea-ture set

IC(Cshared) = minussum

cisinCshared

log(

f req(c)

n

)(7)

After having obtained resource annotations as wellas IC values similarity scores between the input re-source r and each potentially relevant resource q canbe calculated When a user profile contains more thana single item similarity values are aggregated throughsummation to determine the final recommendationscore (Definition 8)

Definition 8 (Recommendation score) The recom-mendation score of a potentially relevant resource q isquantified by the sum of similarities with each resourcer that can be found in the profile (Pr)

score(Pr q) =sumrisinPr

sim(r q) (8)

Upon calculation of similarity values for each po-tentially relevant item q the engine ranks the retrievedresources based on a given limit (k) that is specified bythe user in the SKOSRec queryWith the presented methods of on-the-fly retrieval rec-ommendations can be generated from LOD reposito-ries through an API interface without requiring lo-cal metadata or excessive preprocessing operationsother than the mapping of profile items with LOD re-sources4

4Di Noia et al describe how a mapping procedure can beconducted for a real-world implementation For their LDRS theymatched movie titles with the corresponding LOD resources in DB-pedia The authors extracted IRIs labels and publication dates forall movie resources They applied the Levenshtein distance on theseresources to identify matching movies items [13]

33 Preference Querying

In the previous subsection it was assumed that usersexpress their tastes in the form of preference state-ments for a certain item in the form of an IRI refer-ring to a LOD resource However it may also be thecase that likings are only vaguely known for instancewhen users prefer products with specific features butare not able or willing to specify the actual itemsThe engine obtains such a profile by issuing a sub-sequent SPARQL query that contains the preferencegraph pattern (Ppre f ) in the WHERE condition Sup-pose a user is interested in Quentin Tarantino movies(dbrQuentin_Tarantino) but does not specify theprecise items she likes He could send a preferencequery to the SKOSRec engine The graph pattern inthe request would match all movies that are con-nected to dbrQuentin_Tarantino with the prop-erties dbodirector or dboproducer (see Listing7)

Listing 4 SKOSRec query to generate recommenda-tions based on a preference query in the movie domain(Q2)

PREFIX dbr lthttpdbpediaorgresourcegt PREFIX dbo lthttpdbpediaorgontologygt

RECOMMEND movie TOP 5PREF [ prefMovieWHERE prefMovie dbodirector dbrQuentin_Tarantino UNIONprefMovie dboproducer dbrQuentin_Tarantino ]

Table 35 shows the corresponding solution set tothe preference query containing films from DBpediathat are similar to Quentin Tarantino movies

Table 3Result set for Q2

moviedbrThe_Godfather_Part_IIdbrThe_Prestige_(film)dbrThe_Dark_KnightdbrPlanet_TerrordbrClerks

The notion of a queried profile is specified in Defi-nition 9

Definition 9 (Queried profile) A queried profile is auser profile that has been compiled by retrieving allLOD resources r that are part of an annotation graph

8 L Wenige et al Similarity-based Graph Queries

AD and are contained in the solution mappings result-ing from the evaluation of a specified graph pattern(Ppre f ) over a dataset D (Eq 9)

Pr = micro(r) | micro isin [[Ppre f ]]D r isin dom(micro) r isin AD

(9)

34 Prefiltering

To facilitate exploratory search in LOD reposito-ries users should also have the option to formulateconditions in their queries A possible way to enablefilters during retrieval is before similarity calculationThus recommendation lists can be adjusted to individ-ual needsDue to the expressiveness of RDF data filter condi-tions can not only be applied on attributes that are di-rectly connected to potentially relevant items but canalso be declared as a graph pattern By this means it ispossible to retrieve LOD resources that are both simi-lar to the user profile as well as fulfill an advanced con-ditionThe strengths of the approach will be illustrated with atravel recommendation request that was executed overDBpedia Suppose a customer expressed a preferencefor the Lake Baikal region (eg as indicated by a pos-itive rating on a travel community site) With simpleon-the-fly retrieval she would receive a recommen-dation list that primarily consists of travel destinationsthat are located in the proximity to this geographicalarea A graph-based filter on the other hand facili-tates advanced retrieval options In the given exampleit enables the user to obtain suggestions for SoutheastAsian travel destinations that have similar features asthe Lake Baikal region even though the resource dbrLake_Baikal is not directly connected to any desti-nation with the annotation dbcSoutheast_Asia butcan only be identified by exploring the surroundinggraph structure of the filter attributeThe advanced constraint ensures that the generatedresult set is not empty A simple attribute-level fil-ter which is commonly utilized in faceted search sys-tems would not generate any search hits By thesemeans graph-based filter patterns can help users tobetter explore knowledge graphs thereby overcomingdata quality issues in LOD repositories The corre-sponding SKOSRec query to this example is shown inListing 5

Listing 5 SKOSRec query with an expressive pre-filter condition to retrieve Southeast Asian destinationsbased on a preference for the Lake Baikal region (Q3)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbc lthttpdbpediaorgresourceCategorygtPREFIX dbo lthttpdbpediaorgontologygtPREFIX dct lthttppurlorgdctermsgtPREFIX skos lthttpwwww3org200402skoscoregt

RECOMMEND destination TOP 5PREF dbrLake_Baikal[ WHERE destination dbocountry country

country dctsubject countrySubjectcountrySubject skosbroader dbcSoutheast_Asiadestination rdftype dboPlace

]

Table 34 shows the solution to this query It con-tains travel destinations and regions of interest thatare both similar to the previously liked destinationdbrLake_Baikal and are located in Southeast AsiaEcoregions around lakes rivers or mountains fromSoutheast Asian countries such as Cambodia Vietnamor Malaysia are presented to the user

Table 4Result set of Q3

destinationdbrTonle_SapdbrMekongdbrMount_KinabaludbrLaguna_de_BaydbrLake_Toba

The technical details for this type of query are givenin the following definitions Before the extraction ofrelevant resources and annotations the recommenda-tion engine identifies the resources that satisfy the fil-ter (Definition 10)

Definition 10 (Prefiltered resources) The mapping offiltered resources Ωpre f ilter is obtained by matching aspecified graph pattern (P) to the RDF graph of adataset D (Eq 10)

Ωpre f ilter = micro|q | micro isin [[P]]D (10)

After preselection resources are brought togetherwith user profile data to retrieve the final set of recom-mendations (Definition 11)

Definition 11 (Annotations and relevant resources(prefiltered retrieval)) The solution mappings of an-notations and relevant resources in prefiltered retrievalmode is obtained by joining the set of prefiltered re-

L Wenige et al Similarity-based Graph Queries 9

sources with the mappings Ωr of all relevant resourcesand annotations as determined from the user profile

Ωpre = Ωpre f ilter Ωr (11)

After retrieving annotations and relevant resourcesin prefiltering mode the system generates constraint-based recommendations In this context the rankingprocedure is adapted to the restricted set of relevantresources and annotations Hence IC values of match-ing annotations and respective SKOS similarities arequantified according to the user filter By this meansrecommendation scores reflect the similarity of itemsfor the actual set of filtered LOD resources Except forthis restriction the system carries out the calculations(ie determination of similarity values the ranking ofLOD resources) in the same way as in non-filteringmode

35 Postfiltering amp Combinations

The SKOSRec engine cannot only combine simi-larity calculation and graph pattern matching to pre-filter relevant LOD resources it can also apply userconstraints after similar items have already been de-termined By this means recommendations are filteredex-post to the similarity detection process This fea-ture enables novel retrieval patterns in which sugges-tions are part of a subquery As an example scenariofor a postfilter recommendation task consider the fol-lowing SKOSRec query which obtains suggestions formovies and directors based on a userrsquos preference forthe director dbrQuentin_Tarantino (see Listing6) For the given request the engine would identifysimilar directors based on Quentin Tarantinorsquos char-acteristic features (eg geographic origin or awardswon) and would determine the movies these similar di-rectors have shot Thus users may receive interestingmovie recommendations

Table 5 shows the solution the SKOSRec system re-turns when Q4 is issued against DBpedia The ellipsesin the displayed tables indicate records that have beenexcluded for brevity reasons

However this approach can still be improved uponFor instance the engine only considers the metadatadescriptions of the director whereas related movieshave to be retrieved anew upon execution of the post-filter section Thus they are not ranked according tosimilarity in the final output tableAnother weakness is the biased semantics of the query

Listing 6 SKOSRec query to generate simple post-filtered recommendations in the movie domain (Q4)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX yago lthttpdbpediaorgclassyagogt

SELECT director movieWHERE movie dbodirector director director rdftype yagoDirector110014939 LIMIT 50RECOMMEND director TOP 5PREF dbrQuentin_Tarantino

Table 5Result set of Q4

director moviedbrKirk_Douglas dbrScalawag_(film)dbrKirk_Douglas dbrPosse_(1975_film)dbrKevin_Costner dbrThe_Postman_(film)dbrKevin_Costner dbrDances_with_Wolves dbrGene_Kelly dbrHello_Dolly_(film)dbrGene_Kelly dbrInvitation_to_the_Dance_(film) dbrAlan_Alda dbrGoodbye_Farewell_and_AmendbrAlan_Alda dbrMargaretrsquo_s_Engagement dbrSteve_Buscemi dbrInterview_(2007_film)dbrSteve_Buscemi dbrAnimal_Factory

Usually when a person favors a movie director thispreference refers to the movies this director has shotand not to the personal features of the person Two di-rectors might be similar according to certain featuressuch as birth year nationality or received awards butthe style and genre of the movies they have created canstill be different A third factor is that recommenda-tion retrieval for movie directors should also take intoaccount that a film artist is likely more relevant themore movies she has shot that are in line with the pre-ferred movies of the user Hence the SKOSRec engineenables another postfiltering strategy which is calledaggregation-based retrieval This strategy can be help-ful whenever an entity type is linked to a couple ofLOD resources in a dataset such that similarity scoresof related entities (eg movies) can be aggregated toan overall recommendation score for a certain item(eg for a movie director)Aggregation-based requests can be well combinedwith other retrieval patterns such as preference queries

10 L Wenige et al Similarity-based Graph Queries

In the case of the movie recommendation scenario auser could state that she enjoyed Quentin Tarantinomovies in the past and would like to receive sugges-tions for directors that shot similar film Usually thiskind of question can typically only be answered byanother human being eg in a personal interactionamong friends or in an online forum The fact that thelanguage facilitates these kinds of queries is one of itskey advantages

Listing 7 SKOSRec query to generate post-filteredrecommendations based on a preference query in themovie domain (Q5)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX yago lthttpdbpediaorgclassyagogt

SELECT director movieWHERE movie dbodirector director director rdftype yagoDirector110014939 LIMIT 50AGG dbrQuentin_Tarantino director SUMRECOMMEND movie TOP 100PREF [ prefMovieWHERE prefMovie dbodirector dbrQuentin_Tarantino ]

Table 6 depicts the results of Q5 as retrieved fromDBpedia When comparing the solutions sets for Q4and Q5 it becomes clear that they are totally dif-ferent While Q5 generated a list of directors whohave mostly created action and thriller movies with atwist (ie Tarantinorsquos unique filmmaking style5) Q4provided a completely different output It shows USAmerican actor-directors whose movies range frommusical films (dbrHello_Dolly_(film)) towestern movies (dbrDances_with_Wolves) ordrama films (dbrInterview_(2007_film))Thus the directors in the solution table of Q4 (Table5) seem rather arbitrary and not exceptionally helpfulconcerning the initial user request

We define this kind of aggregation-based request asa so-called rollup query This definition was chosen be-cause the final suggestions are based on summarized(ie rolled up) preferences for sublevel entities An-other example for a roll-up request stems from the ap-

5httpsenwikipediaorgwikiQuentin_Tarantino

Table 6Result set of Q5

director moviedbrSteven_Spielberg dbrIndiana_JonesdbrSteven_Spielberg dbrJurrasic_Park_(film) dbrFrancis_Ford_Coppola dbrThe_Godfather_Part_IIdbrFrancis_Ford_Coppola dbrThe_Godfather_Part_III dbrChristopher_Nolan dbrThe_Prestige_(film)dbrChristopher_Nolan dbrThe_Dark_Knight dbrRobert_Rodriguez dbrPlanet_TerrordbrRobert_Rodriguez dbrMachete_(film) dbrKevin_Smith dbrClerksdbrKevin_Smith dbrClerks_II

plication scenario of travel search This recommenda-tion query derives suggestions for city trip destinationsfrom the points of interest (POI) a user has visited andliked in another city (eg London) Listing 8 showsthe corresponding SKOSRec query for this scenario

Listing 8 SKOSRec query to generate post-filteredrecommendations (aggregation-based) for a preferenceprofile containing the city of London (Q6)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX dullthttpwwwontologydesignpatternsorggtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX rdfslthttpwwww3org200001rdf-schemagt

SELECT cityWHERE poi property city city rdftype dboCity property rdfssubPropertyOf dulhasLocation LIMIT 3AGG dbrLondon city SUMRECOMMEND poi TOP 100PREF dbrOxford_StreetdbrNotting_HilldbrSouth_Bank

The SKOSRec request contains subquery com-mands that refer to the userrsquos preferred POIs It spec-ifies how similarity scores of the generated sugges-tions should be aggregated (ie sum-based) The re-trieval is based on the 100 POIs that are most simi-lar to the ones in the preference section Additionallythe subquery part refers to the DBpedia resource thatrepresents London (AGG dbrLondon city) thusindicating that the list of recommendations should notcontain any POIs in this city Upon processing the de-

L Wenige et al Similarity-based Graph Queries 11

picted query over the DBpedia repository the enginedisplays the three cities for whom the highest aggre-gated scores have been detected (see Listing 7)

Table 7Result set of Q6

citydbrOxforddbrAbingdon-on-ThamesdbrWallingford_Oxford

The motivation for formulating such a query is thatwhen searching for cities preferences might be betterrepresented through the sights a user has visited else-where rather than through a simple on-the-fly requestsolely containing a preference statement for a city inthe user profile sectionBesides the above presented roll-up retrieval patternsthe SKOSRec grammar may facilitate the formula-tion of additional types of advanced recommendationqueries For instance cross-domain queries are an in-teresting additional pattern to be explored for personal-ized retrieval Listing 9 shows an example request thatgenerates movie suggestions based on the preferencefor a music band (ie dbrThe_Jackson_5)

Listing 9 SKOSRec query to generate cross-domainrecommendations (Q7)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgt

SELECT movie participated musicalActWHERE movie rdftype dboFilm VALUES dbobasedOn dbomusicComposerdbogenre dbostarring musicalAct participated movie UNION movie participated musicalAct LIMIT 10AGG dbrThe_Jackson_5 movie MAXRECOMMEND musicalAct TOP 100PREF dbrThe_Jackson_5WHERE VALUES actType dboMusicalArtist dboBand musicalAct rdftype actType

The query contains a prefilter condition that trig-gers retrieval of music artists and bands Afterwardsthe scores of the top 100 suggestions are summarizedwith maximum-based aggregation Thus the highestsimilarity value among the set of music acts deter-

Table 8Result set of Q7

movie musicalActdbrMoonwalker dbrMichael_Jackson dbrThe_Wiz_(film) dbrAshford_amp_SimpsondbrGarfield_Gets_a_Life dbrThe_TemptationsdbrPipe_Dreams_(1976_films) dbrGladys_Knight_amp_the_PipsdbrThe_Jacksons dbrJermanine_Jackson

mines the ranking score of the film Also note be-cause of the aggregation-based retrieval procedureconnections between movies and the LOD resourcedbrThe_Jackson_5 are excluded from similaritycalculation The scenario exemplifies how semanticrelationships can help to generate suggestions for a tar-get domain (ie movies) that are based on a profilecontaining items from a different source domain (iemusic acts) While regular SPARQL queries performexact matching of triple statements the SKOSRec syn-tax facilitates the integration of similarity- and graph-based retrieval to enrich result lists with other poten-tially relevant itemsAs a point of reference consider the SPARQL queryfor the given cross-domain scenario (Listing 10)

Listing 10 SPARQL query to generate cross-domainrecommendations (Q8)

PREFIX dbr lthttpdbpediaorgresourcegt PREFIX dbo lthttpdbpediaorgontologygt PREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgt

SELECT movie musicalActWHERE movie rdftype dboFilm VALUES dbobasedOn dbomusicComposerdbogenre dbostarring dbrThe_Jackson_5 participated movie UNION movie participated dbrThe_Jackson_5 LIMIT 5

The query omits the similarity calculation partThus it only retrieves movies linked to the resourcedbrThe_Jackson_5 By this means the SPARQLresults have a better chance of being related to the ini-tial user profile On the other hand empty result listscan occur when the preferred LOD resource does nothave any direct connections to recommendable itemsThis assumption is backed up by the result lists of thetwo queries (Tables 8 and 9) The SKOSRec query pro-duces a more comprehensive solution than a regularSPARQL query

12 L Wenige et al Similarity-based Graph Queries

Table 9Result set of Q8

movie musicalActdbrThe_Jacksons dbrThe_Jackson_5

The following of definitions will formalize the post-filtered and aggregation-based retrieval forms Defini-tion 12 specifies the notion of SKOSRec recommenda-tions that may potentially undergo postprocessing

Definition 12 (SKOSRec recommendations) SKOS-Rec recommendations (R(Pr)) are the set of sugges-tions generated by processing the solution mappings(Ω) obtained from recommendation retrieval Theycan be a result of simple on-the-fly retrieval prefilter-ing or a result of a combination of these techniques Prrepresents the input profile that contains at least onepreference for an item The cardinality of R(Pr) is theintended number of recommendations (k) that is spec-ified by the user based on which the engine selects theresources with the highest recommendation scores thatare not contained in the profile (Eq 12)

R(Pr) = x | x isin q | score(Pr q) gt kmaxqisinΩ

score(Pr q) q isin Pr

(12)

The approach of subquerying with recommendationresults builds on the idea that SKOSRec suggestionsR(Pr) can be joined with any graph pattern Ppost pastto the process of similarity calculation (Definition 13)It does not make a difference how the engine obtainedthese recommendations

Definition 13 (Postfiltered recommendation mapping)The postfiltered recommendation mapping is obtainedby joining solution mappings generated from SKOS-Rec recommendations with the results of matching apostfilter graph pattern (Ωpost) (Eqs 13 and 14)

Ωpost f ilter = micro | micro isin [[Ppost]]D x isin dom(micro)

(13)

Ωpost = micro|x | x isin R(Pr) Ωpost f ilter (14)

As the prefilter Ppost can be comprised of triplestatements that are eligible in the RecGroupGraph-

Pattern part of the SKOSRec grammar Aggregation-based queries have to be treated differently than a reg-ular postfilter request Apart from excluding sublevelitems of the profile from the set of similar resourcesthe engine should also omit any connections betweenthe preferred superordinate item (a) and similar sub-level entities Additionally the user has to specify thevariable in his profile (eg y) that represents the itemswithin the postfilter graph pattern Ppost By this meansaggregation-based recommendations can be generated(Definition 14)

Definition 14 (Resources for aggregation-based re-trieval) The set of resources for aggregation-based re-trieval is obtained by retrieving LOD resources thathave a connection to a previously generated sugges-tion thereby excluding all resources that are linked tothe specified superordinate item a in the profile (Eq15)

Ragg(a) = micro(y) | micro isin Ωpost y isin dom(micro) a isin micro(y) (15)

For each potential aggregation-based suggestion (y)similarity scores of connected entities (x) have to beconsidered for the final ranking In this context differ-ent approaches to aggregation can be applied In caseswhen both similar items and aggregation-based recom-mendations are entities on comparable levels of gran-ularity then choosing the maximum similarity scoremight be the best way to represent the relatedness ofthe resource to the user profile On the other handwhen postfiltered recommendations are based on sim-ilarity scores of their sublevel entities (eg as in themovie recommendation example) the scores of relatedsuggestions should be aggregated by the sum or theaverage of individual scores

Definition 15 (Aggregation-based ranking score) Theranking score of an LOD resource y that is connectedto a set of recommended items is determined by aggre-gating scores of connected entities either by the max-imum (Eq 16) the sum (Eq 17) or the average (Eq18) of individual scores

scoreaggMAX(y) = maxxisinmicro(xy)

score(Pr x) (16)

scoreaggS UM(y) =sum

xisinmicro(xy)

score(Pr x) (17)

L Wenige et al Similarity-based Graph Queries 13

scoreaggAVG(y) =

sumxisinmicro(xy) score(Pr x)

|x isin micro(x y)|(18)

4 Evaluation of the SKOS Recommender

41 Experimental Setup

The novel recommendation strategies of the SKOS-Rec engine were evaluated in the context of web-basedexperiments for the usage scenarios of travel destina-tion search and multimedia RS The online approachhelped to reach out to more participants with diversedemographic backgrounds which would not have beenpossible in a laboratory experiment Another positiveside-effect of the web-based setting was that partici-pants could test and rate recommendations without be-ing directly observed by an experimenter which mighthave caused biased results otherwise Hence the on-line approach softened common limitations of labora-tory studies [19]The experiments were carried out in the defined usagescenarios with metadata descriptions from DBpediaThe database containing the local mirror of DBpediaran on a virtual server Besides the triple store a webapplication was hosted on the same machine The web-site ran on an Apache Tomcat 7 application server andwas implemented with the help of Java servlet technol-ogy67

We conducted the online experiments between April2016 and January 2017 Upon setting up the web ap-plication a pretest was carried out to control for po-tential pitfalls such as misleading navigation Two testusers ran step by step through the web interface andprovided their feedback Because of their insights weslightly modified the interface (eg through rephras-ing user instructions) to increase the chances of suc-cessful completion In case the SKOSRec engine is ap-plied in a live setting the design of the user interfacewill have to be tested with more users Our experimen-tal series focused on algorithmic performance whichwas tested with numerous participantsSubjects were recruited through the clickworkercomplatform8 Participants received a compensation fee

6httptomcatapacheorg7httpwwworaclecomtechnetworkjavajavaeeservletindexhtml8httpswwwclickworkerde

of 100e or 150e for a completed survey 103 par-ticipants took part in the study on travel destinationsearch In the multimedia domains in total 154 sub-jects evaluated the SKOSRec engine For each partof the evaluation it was ensured that participants hadnot just run through the evaluation screen Wheneverthere appeared to be irregularities in the log files (egthe default settings of the screen were not changed)these assessments were excluded from further analy-sis The survey template started with a consent formIt informed participants about the context and the pro-cedure of the study After having filled in the consentform participants were navigated to the first part of theexperiment It collected data on subjectsrsquo demograph-ics and their consumption and search habits After thepreparation phase participants worked on the first testcase (TC1) in each usage scenario In this test case theperformance of the SKOSRec engine was assessed inthe simple execution mode of on-the-fly recommenda-tions (see Subsect 32) It was done to gather data onthe usefulness of suggestions resulting from a simplecontent-based approach (ie baseline method) withwhich more advanced recommendation strategies werecompared at a later stage In cases where users did notstate any items the application showed an error pageWhile setting up a profile required some effort on theside of the user it was a necessary step for the experi-ment Since we did not have access to the log files of alive application profile generation had to be simulatedThe process was facilitated by a specifically tailoredinterface (see Fig 2) where subjects could search foritems through an autocomplete field that applied AJAXtechnology9

It enabled users to type in keywords (eg the nameof a music act or the author of a book) for whichmatchings were instantaneously displayed withoutreloading the site Hence participants could quicklyfind and select their items of interest Whenever auser entered a query through the AJAX interfacethe system matched the query keywords against anApache SolrLucene search index10 The index wasbuilt with item metadata which had been extractedfrom DBpedia prior to setting up the website Theindex comprised the names and abstracts of itemsWhenever available information on thumbnail pic-tures were saved in the index to be ready for display inthe AJAX interface In case the Apache SolrLucene

9httpapijquerycomjqueryajax10httpluceneapacheorgsolr

14 L Wenige et al Similarity-based Graph Queries

Fig 2 User profile generation TC1 (music domain)

search engine found index keywords that matched theuser query metadata information for these items wereimmediately shown on the screen This informationwas displayed to help participants make an informeddecision on whether the AJAX result list containedsuitable items for profile generationAside from the AJAX feature the appeal of the in-terface was increased by a progress bar at the top ofthe webpage (see Fig 2) Progress bars are commonlyutilized tools for web surveys In conventional paper-based studies subjects usually see how far ahead theyare while in a web survey they cannot check theirprogress immediately This is especially true for stud-ies that apply a screen-by-screen navigation whichwas the chosen approach for the conducted web ex-periments Progress bars prevent users from tiring ofthe questionnaire and withdrawing from the study alto-gether by showing how close participants are to com-plete the experiment [14]Upon profile generation the SKOSRec engine calcu-lated on-the-fly recommendations for a simple query(similar to Q1 Sect 3) which were shown to usersin a separate screen Suggestions were displayed witha sufficient amount of information eg thumbnailslabels or a short description to help participants as-sess the quality of the recommendation In addition tothis information users could also access the respectiveWikipedia article by following the hyperlink on the la-bel Relevance sliders were used to assess the utility ofeach recommendation They are often applied in eval-uations of IR systems [51 54] Scores were handled aspoints on a 0 to 100 scale of the relevance slider (outerleft side lowest relevance outer right side highest rel-evance see Fig 3)

This approach facilitated the calculation of an accu-rate precision score and the application of tests withhigher statistical power than an ordinal 5-star scaleThe score called mean relevance (mrs) was measuredwith Equation 19

mrs =

ksumi=1

scorei

k(19)

In addition to the precision score the web applica-tion also determined the size of the recommendationlist produced by the engine as an indicator for recallHowever recommendation list size can only be uti-lized for this purpose when mrs scores reach a reason-able levelWhile accurate recommendations are useful becausethey build trust in a system [57] they only capture anarrow aspect of performance [36] For instance anonline retailer can tremendously profit from an RSthat provides both relevant and unobvious recommen-dations since it enables users to explore the prod-uct catalog more thoroughly On the other hand inthe case of an RS only suggesting popular productsit is likely that the user already knows them [3 23]A pure accuracy-based evaluation disregards these as-pects which is why other performance indicators suchas novelty were measured as well Thus the evalua-tion interface also contained a section where subjectshad to tick a radio button that indicated whether anitem was new In the backend of the web applicationnovelty was measured as the ratio of relevant items not

L Wenige et al Similarity-based Graph Queries 15

Fig 3 Recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

familiar to the user (|nov|) among the total numberof relevant items (|tp|) (Eq 20)

nv =|nov||tp|

(20)

An item was considered relevant when it achievedan mrs score of 50 or higher In addition to novelty anddiversity the web application also tracked the diversityof recommendation lists since a topically diversifiedresult set has been found to increase overall user satis-faction [31 66] In the RS literature the most widelyexplored diversification metrics are the ones that mea-

sure the similarity between each item pair in the rec-ommendation list The more the items are alike theless diverse is the result set [19] The study series ofthis paper applied the metric by Castells et al [8] Itdetermines item-to-item similarity values with the co-sine similarity measure Scores are then converted todistance values (Eq 21) and averaged for the recom-mendation list (Eq 22)

d(il im) = 1minus cos(il im) (21)

16 L Wenige et al Similarity-based Graph Queries

divC =2

k(k minus 1)

sumlltm

d(il im) (22)

The content-based diversity scores (divC) were cal-culated in the backend of the web application andsaved in a log file for each generated recommenda-tion list of the SKOSRec engine during runtime execu-tion In addition to the measurement of divC scores weasked participants about their opinions on recommen-dation list diversity directly Following the RS evalua-tion framework by Pu et al [44] two Likert items wereposed for this purpose

ndash The recommendations of list [] are diversendash The recommendations of list [] are similar to

each other

Besides the above listed algorithmic performancemetrics the authors gathered assessments from usersregarding the overall usefulness of the suggestionsThis approach is also in line with the empirically veri-fied evaluation framework by Pu et al [44] which pro-poses to measure the psychometric construct of Per-ceived Usefulness to get a comprehensive understand-ing of user opinions on recommendation results Thusparticipants were instructed to state their agreement topositive statements regarding the relevance of the sug-gestions The statements were adapated to the speci-ficities of each usage scenario Fig 4 shows an exam-ple collection of Likert questions for the domain ofmusic RS

After participants had provided the required infor-mation their answers were saved in a log file and theywere navigated to test case 2 (TC2) which evaluatedthe SKOSRec enginersquos performance for constraint-based queries (similar to Q3 Sect 3) We tried to findout if the engine was able to improve user satisfactionthrough filter conditions Even though IR systems al-ready successfully apply facet filters on result sets theexperiments still had to prove whether LOD-enabledconstraints have the same effect on the Perceived Use-fulness of recommendations For the comparisons ofnon-filtered and filtered suggestions we applied awithin-subjects design in which participants had torate two recommendation lists resulting from regular(non-filtered) and filtered requests in TC2 We favoredthis setup over a between-subjects design where sub-jects would have been assigned to only one result listWhile the between-subjects design more closely re-sembles a real-world setting since it does not require

participants to interact with more than one approachat a time differences among subjects (eg regard-ing demographics or topic expertise) can potentiallyhave a considerable impact on quality judgments [19]Hence when applying a between-subjects design onedoes not know whether differences in performance arecaused by the approaches or by the variability amongparticipants Therefore researchers need to conductbetween-subjects experiments with a sufficient numberof study subjects to level out these differences Within-subjects studies on the other hand require fewer par-ticipants while still achieving the same level of sta-tistical power since between-subject variability is notan issue [30 31] Beel et al have shown that evensmall variations in an RS test setting could cause sub-stantial differences in performance outcomes [6] Tokeep the extent of variations even lower comparisonsbetween non-filter and constraint-based recommenda-tions were based on the same profile for each user inTC2 Therefore the user profile from TC1 served as astarting point for filtered retrieval The web applicationshowed the recently generated profiles to users andasked them to provide an additional filter conditionFig 5 depicts an example web form for a constraint-based recommendation request in the music domain Itcomprises the user profile and an additional text fieldwhere users stated their constraint

Afterwards the engine applied this filter conditionon the set of potential recommendation results be-fore similarity calculation started For each usage sce-nario different types of constraints were available Re-garding suitable filter options the author sought toachieve a balance between assisting users in finding fil-ter conditions that would adequately represent their in-formation needs while keeping the variability amongusers as low as possible Therefore the specificitiesof each scenario and the availability of the respectivedata sources were determined For instance for themultimedia domains (movie music and books) it wasassumed that users would like to filter recommenda-tions according to the specific genre of the item Fortu-nately the DBpedia dataset contains this kind of infor-mation for the music domain In this domain the genrecan be retrieved through the properties dbogenreHowever in the movie and book domains DBpediadoes not provide sufficient information Many LOD re-sources which represent books were not assigned agenre In the movie domain a genre property is miss-ing entirely and genre-specific information can of-ten only be found in the subject categories describ-ing the movie Hence in contrast to the experiment on

L Wenige et al Similarity-based Graph Queries 17

Fig 4 Evaluation screen for recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

music recommendations participants in the domainsof books and films did not filter their recommenda-tion lists by genre On the other hand the propertydcsubject was applied as recommendation filter ineach multimedia domain Since the subject propertyoften links to many informative features of multimediaitems (eg release period geographic information orcontent information) the author decided that this fea-ture represents a powerful filter dimensionIn the experiment on travel destination search the webinterface offered two filters a subject and a location-specific filter The subject filter enabled users to statetheir preferences regarding the characteristics of a lo-cation (eg being a nature reserve) The location-specific filter on the other hand gave participants theoption to specify where the desired destination shouldbe located11

The second fundamental research issue of TC2 con-cerned the question whether expressive constraints canboost recommendation quality even further Hence

11httpdbpediaorg

apart from executing a constraint-based workflow witha simple filter (ie a direct attribute of an item)the SKOSRec engine generated suggestions with thehelp of an expressive filter as well This second pro-cedure while still applying the same user constraintas the simple filter utilized an expressive graph pat-tern to include additional LOD resources in the resultset For instance in case a subject filter was selectedresult set expansion was facilitated by exploring thewider semantic space of the DBpedia category graphthrough a property path declaration in the graph pat-tern (eg skosbroader[2]) In the travel exper-iment an additional expressive filter retrieved place-related information (dulhasLocation) for location-specific subproperty relations that are not material-ized in DBpedia Hence the expressive graph pat-tern expanded the filter such that additional proper-ties were matched (eg dbocountry dbocity ordbostate) while still applying the specified userconstraint on the set of LOD resources Upon exe-cution of constraint-based recommendation requestsusers received two recommendation lists (resultingfrom simple and expressive filtering accordingly) in a

18 L Wenige et al Similarity-based Graph Queries

Fig 5 Web form for a contraint-based recommendation request (music domain)

separate evaluation screen Participants evaluated re-sult sets in the same manner as in TC1 For each resultset the screen contained an additional Likert item ask-ing subjects to give their agreement to the statementldquoThe filter has improved the recommendation resultsof listrdquo The sequence of suggestions resulting fromthe different methods was randomly assigned in eachuser session to avoid order effectsAfter study subjects had completed TC2 they werenavigated to test case 3 (TC3) In the TC3 section ofthe travel experiment participants could choose be-tween three options for rollup query patterns (simi-lar to Q6 Sect 3) Users were able to obtain recom-mendations for travel destinations based on POIs theyhad visited during the stay in another destination Theycould either select a city (option 1) a region (option 2)or a country (option 3) as destination type Upon en-tity type selection the SKOSRec engine generated ap-propriate suggestions Fig 6 shows the web form fromthe travel experiment that facilitated the formulation ofadvanced rollup requests in TC3

When users had selected an entity type stated theirfavorite travel destination and three belonging POIs

the application generated a SKOSRec query from theseparameters A simple on-the-fly request was sent tothe engine as well to retrieve baseline recommenda-tions with which the suggestions resulting from theadvanced request were compared at a later stage ofthe experiment The simple query only contained theuserrsquos favorite travel destination and the selected en-tity type option thus omitting information on the POIsDuring the travel experiment the engine processedboth the simple and the advanced query and producedtwo recommendation lists As in the previous parts ofthe experiment participants assessed the quality of theresult sets which appeared in random order on thescreenThe multimedia experiments also contained a sec-tion on rollup retrieval (TC3) As the advanced travelqueries the multimedia requests aggregated similarityscores of sublevel entities through summation whichthe engine joined with a postfilter In addition to thesefeatures the pattern contained a preference query partThis section identified a set of preferred items througha single user statement Participants either specifiedtheir favorite directoractor (music domain) music act

L Wenige et al Similarity-based Graph Queries 19

Fig 6 Travel - User profile generation TC3

(music domain) or author (book domain) and receivedrecommendations based on the features of the workscreated by the artist (similar to Q5 Sect 3) TheSKOSRec engine generated recommendations basedon the creative works of the favored artist Partici-pants also received suggestions from a simple on-the-fly query which computed results according to theartistrsquos characteristics As in the travel experiment rec-ommendations from the advanced rollup request werecompared with the on-the-fly query without letting par-ticipants know which approach they were assessingand through random assignment of result set positionsAfter participants of the multimedia experiments hadevaluated the results of TC3 they were guided to testcase 4 (TC4) which evaluated cross-domain queriesThe experiment on travel RS ended with TC3 becauseno suitable graph patterns could be identified to fa-cilitate these kinds of suggestions for the travel usagescenario In the multimedia domain however the datafrom DBpedia was sufficient for this retrieval task Theentity type of the study (ie movie music act or book)defined the source domain based on which the enginegenerated suggestions for items from another multime-dia target domain (similar to Q7 Sect 3) Fig 7 de-picts the cross-domain web form of the music experi-ment

The web form assisted subjects in formulating a re-quest which then obtained movie suggestions basedon the favorite music act of the user Parameters fromusers were entered in the placeholders of the querypattern in the backend of the application We formu-lated the query templates in such a way that the queryengine matched any properties or graph patterns thatcould potentially connect two items from the specified

target and source domains For instance such match-ings can occur when a user states his favorite moviethat is written by a particular author In case the sameauthor has also written some books they might be ofinterest to the user as well Another example stemsfrom the movie domain Suppose in hisher profile aconsumer has declared a preference for a movie thatlinks to the corresponding soundtrack It might wellbe the case that the user likes the soundtrack and themusic acts that were involved in creating it just asmuch as she likes the movie The practice to distin-guish homonymous LOD resources through the prop-erty dbowikiPageDisambiguates is another in-teresting linking pattern from DBpedia that was ex-ploited for cross-domain requests in TC4 since it con-nects similar entities (ie the book edition of a certainmovie)Cross-domain SPARQL queries (similar to Q8 Sect3) were posed as baseline requests in this test caseWhile the SPARQL query only matched cross-domainrelations for a single LOD resource the SKOSRec re-quest expanded the search space by considering linksfrom more than one item As in the previous test casesTC4 ended with an evaluation screen where users as-sessed the SPARQL-based as well as the SKOSRecsuggestions in random order without knowing whichrecommendation list belonged to which method Table10 gives an overview of the test cases and the meth-ods that were applied in the experiments in each usagescenario

42 Results

In total 257 people took part in the studyThe gath-ered samples from the experiments contained answers

20 L Wenige et al Similarity-based Graph Queries

Fig 7 Music - User profile generation TC4 (page 9)

Table 10Test Cases in the web experiments

Test Case Evaluation Purpose Query Type Travel Movie Music BookTC1 On-the-fly Recommendations (Baseline) Q1 X X X X

TC2 Constraint-based Recommendations Q3 X X X X

TC3 Advanced Queries (Rollup) Q5 Q6 X X X X

TC4 Advanced Queries (Cross-Domain) Q7 x X X X

from both genders with a slight overrepresentation ofmale participants (537) The prevailing number ofstudy subjects was between 20 and 40 years of age(658) The evaluation of the demographics sectionalso revealed that the majority of participants alreadyperceives the search in their domain of interest as ei-ther ldquomanageablerdquo (385) or even as ldquoeasyrdquo or ldquoveryeasyrdquo (440)Prior to analyzing the performance results of the novelretrieval approaches of the SKOSRec engine it waschecked whether the respective Likert items for Per-ceived Usefulness and user-based diversity (divU)reached acceptable reliability levels For this purposeCronbachrsquos α values were determined Table 11 showsthe outcome of this analysis for the Perceived Useful-ness construct Given the acceptable values through-out the domains (Cronbachsrsquo s α gt 07) it could betreated as an interval-scaled variable in the statisticaltests

Table 11Cronbachrsquos α values for the concept of Perceived Usefulness

Domain Cronbach αTravel 07002

Movie 08579

Music 08515

Book 08949

Besides usefulness user-based diversity scores werealso measured by multiple Likert items However

Cronbachrsquos α values for this concept did not reach ac-ceptable levels In order to avoid that valuable user as-sessments remained unused agreements to the state-ment bdquoThe recommendations of list [] are diverse ldquowere taken into account as an ordinal variable (divU)Based on these results subsequent tests could be car-ried out The first part of the analysis concerned theperformance of the on-the-fly retrieval method (base-line) of the SKOSRec engine in TC1 The outcomeof this analysis confirms that the on-the-fly approachrepresents a viable recommendation strategy Table 12shows the mean (M)12 standard deviation (S D) andnumber of participants (N) for each performance di-mension and metric It can be seen that users assessedthe enginersquos performance in almost each quality di-mension to be above mediocrity Aside from the posi-tive user evaluations with regard to Perceived Uselful-ness related quality aspects such as Accuracy Noveltyand Diversity received good assessments as well Forinstance each domain mean relevance score (mrs) washigher than 50 score points It means that participantswere often convinced of the relevance of the resultsGiven these positive user assessments with regard toitem relevance it seems to be justified to consider themean of result list sizes as an indicator of recall Theengine was able to generate the required number ofrecommendations in almost each test case (see Table

12In case of the user-based diversity score (divU) M denotes themedian because of the ordinal scale of the variable

L Wenige et al Similarity-based Graph Queries 21

12) The on-the-fly retrieval approach was also testedwith regard to the Novelty dimension Table 12 showsthe mean novelty scores (nv) for the baseline recom-mendation approach Novelty scores indicate that themajority of items in a result list were deemed to be newin the travel and book RS whereas in the experimentson movie and music RS only one third of the recom-mendation list was unfamiliar to users In contrast tothat content-based Diversity scores (divC) were ratherhigh in each domain Hence it can be concluded thatitems in a result set usually have a low rate of match-ing SKOS annotations In addition to content-basedscores user assessments of recommendation list Di-versity (divU) were also taken into account The me-dian level of agreement to the Diversity-related Lik-ert item stating that recommendations are diverse liedat 3 (ldquoneutralrdquo) in the multimedia domains and at 4(ldquoagreerdquo) in the travel domainIn summary the baseline approach achieved fairlygood performance results in TC1 The findings demon-strate that LOD-enabled recommendations can po-tentially enhance a userrsquos search experience Theyalso allow for subsequent tests of advanced retrievaltechniques which can be compared to this standardmethod

In the analysis of performance results from TC2(constraint-based recommendations) particular atten-tion was paid to participantsrsquo agreements with thestatement ldquoThe filter has improved the recommenda-tion results of the listrdquo Fig 8 shows the distribution ofdomain-wise agreement ratios On average they werefairly high Throughout the domains the vast majorityof participants either selected 5 (ldquostrongly agreerdquo) 4(ldquoagreerdquo) or 3 (ldquoneutralrdquo) Hence a filter often had apositive impact on results

The authors gathered responses to the questionon improvement for recommendation lists resultingfrom both the simple and the expressive filtering ap-proaches Hence they give clues about the generalperformance of constraint-based retrieval Howeversince this section of the study revealed the experimen-tal condition to participants (ie application of a fil-ter) answers to the Likert statement have to be inter-preted cautiously Results may be slightly biased asparticipants were able to guess the underlying agendaHowever these limitations do not exist for compar-isons between simple and expressive filters Since theweb application randomly assigned the list order par-ticipants did not know which approach they were as-sessing This enables us to take a closer look at the dif-ferences between the two filtering approaches It was

Table 12Performance results in TC1 (M = Mean S D = Standard DeviationN = No of participants)

Dimension (Metric)[Max Score]

Domain Results

M SD N

Accuracy (size) [10]

Travel 1000 000 103Movie 1000 000 50Music 1000 000 53Book 980 140 51

Accuracy (mrs) [1]

Travel 5649 000 103Movie 6225 1915 50Music 5570 010 53Book 5747 1555 50

Novelty (nv) [1]

Travel 060 034 101Movie 036 031 50Music 036 039 52Book 061 029 47

Diversity (divU) [5]

Travel 4 - 102Movie 3 - 50Music 3 - 53Book 3 - 50

Diversity (divC) [1]

Travel 080 016 103Movie 087 012 50Music 086 010 50Book 092 008 50

Usefulness [5]

Travel 334 085 103Movie 355 073 50Music 318 085 53Book 343 081 50

hypothesized that advanced filters improve recommen-dation quality as they may identify more relevant re-sources thereby overcoming problems of data qualityor data sparsity in LOD repositories Table 13 lists re-sponse rates of the two methods The rate measuresthe ratio of non-zero result sets among all result setsThroughout the domains recommendation lists result-ing from expressive filtering achieved higher responserates

Table 13Response Rates (TC2)

Domain Constr-based(Regular)

Constr-based(Expanded)

N

Travel 23 57 103

Movie 86 88 50

Music 72 75 53

Book 73 76 51

22 L Wenige et al Similarity-based Graph Queries

000

025

050

075

100

Digital Library Travel Movie Music Book

Domain

Per

cent

age

of A

gree

men

t

Degree of Agreement

1

2

3

4

5

The filter has improved the recommendations of the list

Fig 8 Participantsrsquo agreement with the statement that the filter has improved the result list (domain-wise) (TC2)

In addition to response rates user assessments andscores in the defined quality dimensions were com-pared Table 14 depicts the outcome of this analysisSignificantly higher scores are marked in bold figuresIn case statistical tests identified no meaningful varia-tions the results of the better performing approach areunderlined A-levels were adjusted with the false dis-covery rate (FDR) in order to decrease the probabilityof making a type I error due to multiple testing

According to mean scores expressive requestsgenerated more comprehensive recommendation lists(size) This finding is in line with the increased re-sponse rates shown in Table 13 However subse-quently conducted t-tests confirmed significant dif-ferences only for the travel experiment (t(102) =minus789 p lt 0001) In contrast precision scores werehigher in most of the domains (travel movie mu-sic) when the SKOSRec engine processed a simpleconstraint-based query However statistical tests didnot confirm any systematic differences The authorsapplied Wilcoxon tests for mrs scores and the remain-ing performance metrics because of the small samplesizes resulting from the low response rates of the sim-ple constraint-based approachRegarding Novelty none of the approaches was supe-rior In the Diversity dimension expressive filters gen-erated suggestions that were at least as topically di-versified as the results from simple constraint-basedretrieval Even though no significant differences were

identified between the two approaches increased meanscores (divU in the travel domain and divC in each do-main) indicate a slight superiority of expanded filterrequests in this quality dimension The same appliesto usefulness scores (mean values were higher in 3 outof 4 domains when expressive filtering was applied)However these statements are speculative at best be-cause statistical tests were not significantIn summary it can be concluded that expressive fil-ters improve recall potentially leading to a slight lossin precision scores (see Table 14) It may also be thecase that expanded user constraints diversify as wellas increase the usefulness of recommendation listsHowever these claims are unverified In contrast itis safe to say that expressive query patterns generateresults of at least the same quality as simple patternswhile increasing the number of relevant suggestionsThe few differences may be explained by the high Jac-card scores (JI) (see Table 15) of recommendation listsThroughout the domains result sets were much alikeGiven these similarities and the increases in list sizesfor expressive filters it is supposed that an expandedconstraint produces an enhanced version of the recom-mendation list resulting from simple filtering There-fore expanded filters should be the default setting in aconstraint-based retrieval context

In TC3 (ie evaluation of roll-up query patterns)the SKOSRec engine almost always generated non-empty recommendation lists (see Table 16) In the

L Wenige et al Similarity-based Graph Queries 23

Table 14Performance results in TC2 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Simple Expressive N

M SD M SD

Accuracy (size) [10]

Travel 117 270 452 462 103Movie 672 408 720 392 50Music 596 464 636 451 53Book 459 450 500 451 51

Accuracy (mrs) [1]

Travel 6673 2051 6534 2236 23Movie 6095 2298 6029 2279 43Music 6377 1605 6296 1644 34Book 5617 1818 5681 1681 37

Novelty (nv) [1]

Travel 048 041 061 049 21Movie 044 036 055 080 42Music 055 039 052 037 38Book 077 034 076 033 33

Diversity (divU) [5]

Travel 3 - 35 - 24Movie 4 - 4 - 43Music 4 - 4 - 38Book 3 - 3 - 37

Diversity (divC) [1]

Travel 068 018 072 021 19Movie 070 024 073 023 41Music 086 013 087 012 34Book 084 024 095 063 29

Usefulness [5]

Travel 333 093 350 077 24Movie 332 085 331 086 43Music 355 081 364 078 38Book 339 083 362 121 37

Table 15Jaccard Indices (TC2)

Domain Mean Jaccard Index (JI) SD N

Travel 057 047 25

Movie 071 038 43

Music 082 035 37

Book 092 026 35

travel and the book experiment the response rate washigher for aggregation-based queries whereas in themovie domain the engine more often provided recom-mendations for regular requests However these dif-ferences are only marginal and do not indicate a clearsuperiority of one method

Fig 9 depicts participants general agreement to Lik-ert statements concerning the Perceived Usefulness ofthe approaches in TC3 The diagram shows similarresults for regular as well as advanced requests Onaverage satisfaction scores reached levels which layslightly above neutral agreement A subsequent t-test

Table 16Response Rates (TC3)

Domain Regular Aggr-based N

Travel 99 100 103

Movie 96 92 50

Music 100 100 53

Book 96 100 51

confirmed no significant differences between the twomethods In fact the approach with the highest aver-age score of Perceived Usefulness was different in eachexperiment While mean values were higher for regu-lar requests in the studies on travel and music RS ad-vanced queries produced higher scores in the movieand book experiments

Content-based Diversity (divC) was the only met-ric with systematic differences Here scores went upfor advanced retrieval patterns in the travel (t(98) =minus550 p lt 0001) the music (t(52) = minus451 p lt

0001) and the book domain (t(47) = minus490 p lt

24 L Wenige et al Similarity-based Graph Queries

0

1

2

3

Travel Movie Music Book

Domain

Agr

eem

ent

Method

Regular

Rollminusup

Perceived Usefulness (Regular vs Rollminusup Queries)

Fig 9 Participantrsquos overall satisfaction with regular and roll-up recommendation retrieval (TC3)

0001) (see Table 17) This finding is not surprisinggiven that the aggregation-based approach explores thewider semantic network of each item in the profile

Hence it can be assumed that both approaches pro-duce recommendation lists of similar quality and canprovide benefits to potential consumers Thus it can-not be clearly stated which approach is superior be-cause neither of them outperformed the other in eachdomain While the regular retrieval strategy was bet-ter in terms of usefulness in the travel and the mu-sic domains the aggregation-based approach achievedhigher scores in the domains of movie and book rec-ommendationsThis conclusion is especially impressive given thelow mean Jaccard scores for recommendation lists re-sulting from regular and aggregation-based retrieval(see Table 18) The mean score never exceeded 01

throughout the domains It indicates a low concor-dance between result sets It is remarkable that par-ticipants perceived the recommendations as equallygood given the high dissimilarity of the lists Thusadvanced queries have an added-value as they provideadditional interesting results They can help to exploreLOD repositories in different ways than a regular re-trieval approach Therefore it is worthwhile to offeraggregation-based query patterns as an alternative re-trieval strategy when the regular approach has not pro-duced any helpful recommendations

In addition to TC3 user assessments for TC4 (cross-domain recommendations) were also evaluated How-ever since the authors conducted experiments for TC4only in the multimedia domains the samples were con-siderably smaller Nevertheless it is assumed that theamount of completed user sessions is only just enoughto conduct further statistical analyses Table 19 showsthe response rates for the two retrieval methods Cross-domain patterns had a higher success rate of generat-ing non-zero result sets throughout the domains Par-ticipants almost always received a recommendation forthis query type whereas in case a regular SPARQLquery was executed often they did not receive a singlesuggestion at all

We applied non-parametric tests for the perfor-mance metrics mrs nv divU divC and PerceivedUsefulness due to the low response rates of theSPARQL-based approach which led to a low num-ber of comparable data points accordingly Overall theFDR-adjusted pairwise Wilcoxon tests detected no sig-nificant differences except for user-based Diversityscores (divU) in the movie (Z = minus269 p lt 005) andthe book domain (Z = minus256 p lt 005) The resultsfor the content-based Diversity (divC) metric seem topoint in the same direction because of the high meanvalues for cross-domain queries in each domain Butthe significance of these results was not verified Themrs scores indicate a better performance of regular

L Wenige et al Similarity-based Graph Queries 25

Table 17Performance results in TC3 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular Aggr-based N

M SD M SD

Accuracy (size) [10]

Travel 990 099 942 207 103Movie 288 059 276 082 50Music 1000 000 1000 000 53Book 288 059 300 000 51

Accuracy (mrs) [1]

Travel 4821 2498 4493 2559 102Movie 5458 2288 5669 2542 44Music 5395 1831 5824 1619 52Book 5153 2390 4925 2230 49

Novelty (nv) [1]

Travel 043 039 038 040 98Movie 035 038 028 041 41Music 049 036 046 039 50Book 061 042 062 044 47

Diversity (divU) [5]

Travel 35 - 3 - 100Movie 4 - 3 - 43Music 4 - 3 - 50Book 3 - 3 - 48

Diversity (divC) [1]

Travel 079 017 089 013 99Movie 075 020 066 027 44Music 083 017 093 007 53Book 074 027 093 006 48

Usefulness [5]

Travel 336 077 322 074 101Movie 320 092 326 094 44Music 345 084 344 085 51Book 295 087 303 096 48

Table 18Jaccard Indices (TC3)

Domain Mean Jaccard Index (JI) SD N

Travel 007 011 101

Movie 003 007 44

Music 002 005 53

Book 002 006 50

Table 19Response rates (TC4)

Domain Regular (SPARQL) SKOSRec N

Movie 32 96 50

Music 62 98 53

Book 53 100 51

SPARQL queries but the within-subjects test did notconfirm this assumption statistically (see Table 20)

However the most important finding of the anal-ysis was that SKOSRec cross-domain queries gener-ated significantly more suggestions throughout the do-

mains (movie t(49) = minus1792 p lt 0001 mu-sic t(52) = minus2573 p lt 0001 book t(50) =minus1275 p lt 0001) (see Table 20) Fig 10 illustratesthis graphically

Whenever the SPARQL request provided recom-mendations which was the case in less than half ofthe user sessions (see Table 19) the SKOSRec sug-gestions were assessed to be of almost equal qualityas the SPARQL recommendations (see Table 20 Use-fulness) Hence when SPARQL querying did not pro-duce any results in more than the other half of the testcases the SKOSRec approach may still have provideduseful suggestions

Therefore it is reasonable to assume that the capa-bility of the SKOSRec engine to flexibly switch be-tween processing steps of similarity calculation andpattern matching can facilitate an improved explo-ration of LOD repositories Table 21 also shows thatthe items in the two sets often did not match as isindicated by low Jaccard indices This finding furtherdemonstrates that the execution of an additional step

26 L Wenige et al Similarity-based Graph Queries

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

00

25

50

75

100

Movie Music Book

Domain

No

of i

tem

s

Method

Regular

CrossminusDomain

Recommendation List Size

Fig 10 Mean recommendation list sizes for SPARQL and SKOSRec cross-domain requests (TC4)

L Wenige et al Similarity-based Graph Queries 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

of similarity calculation can help to retrieve other rel-evant items

5 Future Work amp Conclusion

This paper has presented the query-based SKOS-Rec engine that facilitates new types of recommen-dation requests for LOD repositories The evaluationshave shown that the SKOSRec system can often gener-ate relevant and useful suggestions when certain querytemplates are utilized In many cases the applicationof expressive or advanced query patterns (ie in thecontext of constraint-based retrieval or cross-domainrequests) helped to significantly improve recall valueswhile still providing the same level of quality in theremaining performance dimensions Additionally ad-vanced recommendation requests can be used to gener-ate diversified recommendation lists in case users arenot satisfied with the results of regular recommenda-tion requests (ie roll-up or cross-domain retrieval vsregular requests) While not being superior in each do-main and test case is at least safe to say that the novelrecommendation approaches of the SKOSRec engineconsiderably extend common retrieval methods and atleast provide an alternative search strategy when con-ventional methods fail to provide useful results Addi-tionally the increase in diversity and recall is in linewith findings from previous research on LOD-enabledRS [45]It is a strength of the developed approaches that theyfacilitate combinations of graph-based and similarity-based retrieval at different stages of the recommenda-tion workflow This feature extends general search ca-pabilities for semantic networks It requires future re-search to investigate whether in addition to SKOS fur-ther RDF vocabularies can be used for ad-hoc item-to-item similarity computation to increase the range ofpotential usage scenarios since not every LOD datasetcontains SKOS annotations [53] The application ofadditional properties could also boost the recommen-dation quality of some query patterns (eg such as thequery types used in TC3) In this context it also needsto be determined whether the representation of the user

profile in the SKOSRec query language can be substi-tuted with free-text expressions to enable search-likefunctionalities LOD researchers have already devel-oped engines that can process natural language queriesover RDF data [59] [34] [67] It will have to be inves-tigated how the SKOSRec engine profitably fits intothis landscape of existing retrieval toolsAnother open research question concerns the aspect ofinterfaces to other applications In its standard config-uration the system is not an end-user retrieval enginebut a backend application with which an administra-tor who has domain knowledge (eg of RDF vocab-ularies and data models of a particular LOD reposi-tory and usage scenario) creates the respective querypatterns Although the websites of the online experi-ments represent possible implementations of suitableend-user applications for the SKOSRec engine thereare still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selec-tion of LOD items for the profile was made possibleby previously storing the data in an index that couldthen be accessed and searched before issuing a recom-mendation query However this runs against the ideaof ad-hoc retrieval In this context future research willhave to determine whether it is feasible to generate anon-the-fly mapping from items to the respective LODresources In the case of a commercial application theengine needs to match items with the correspondingentries in the product catalog Even if it is not yet clearwhich UI components can make the best possible useof the SKOSRec enginersquos features the results from theuser experiments can give at least a few suggestionsThe following retrieval options should be the defaultsetting in an interface that contains a LOD-enabledRS component to assist users in finding interestingitems Since the regular approach of on-the-fly re-trieval achieved good results throughout the domainsthe display of simple recommendations generated frompreviously stated user preferences represents a suitablestarting point for retrieval On-the-fly queries could berefined by providing filter options Since the evalua-tions have shown that expressive constraints often leadto increased recall values the application of such afilter might be the best option for assisted retrievalAs in the preparations for the experiments appro-priate graph-based query patterns for the usage sce-nario in question would also have to be determinedby a domain expert before setting up the applicationThe same applies to advanced retrieval patterns whichshould be available to the end user to refine the querywhen both the simple as well as the constraint-based

28 L Wenige et al Similarity-based Graph Queries

approach have failed to provide relevant items Thissuggestion is made because roll-up queries were com-petitive with regular recommendations in the web-based experiments Therefore they represent a viablealternative retrieval strategy For some domains (egfor multimedia retrieval) it is also possible to usean optional selection field to enable the formulationof cross-domain requests Evaluations for this querytype suggest that users might receive interesting results(ie diversified and comprehensive recommendationlists) from such a requestIn summary the contributions of the paper are as fol-lows

ndash Development and implementation of novel re-trieval strategies in the SKOSRec system whichfacilitate personalized requests and leverage thepotential of knowledge graphs in order to addressthe research gap in state-of-the-art recommenda-tion methods with regard to customized queries

ndash Realization of web-based user experiments infour usage scenarios with the following out-comes

bull Successful demonstration that knowledge graphdata is applicable in the context of live recom-mendation scenarios

bull Creation of suitable query patterns that canbe applied to represent individual informationneeds in different application contexts The setof query patterns can be extended with the helpof the SKOSRec query language to meet therequirements of additional usage scenarios

bull Proof that the novel retrieval strategies can besuccessfully applied across various applicationcontexts and recommendation tasks and thatthey outperform conventional content-basedretrieval approaches especially in the perfor-mance dimensions of diversity and recall

The prototypical implementation and the evaluationof the SKOSRec engine in different usage scenarioshave demonstrated that the freely available knowledgesources on the LOD cloud can be successfully appliedto advance state-of-the-art methods in IR and recom-mender systems The proposed personalized and cus-tomized retrieval strategies can leverage the potentialof knowledge graphs by better matching informationneeds of users with the help of semantic search op-tions

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be com-prised of up to six elements of which the partsSimProjection and ItemPart are obligatory Henceusers need to state at least their preferencesand how many recommendations they wouldlike to receive Advanced retrieval patterns canbe formulated as well For instance similar re-source retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (Se-lectPart) of LOD resources As in SPARQL thequery can start with a prologue that abbreviatesIRIs with namespace declarations [21]

SelectPart The SelectPart part enables subqueryingwith recommendation results The surroundingquery is a SELECT query with a slightly simpli-fied WHERE clause (ie RecWhereClause) withwhich postfilter conditions can be formulatedIn this section users have the option to formu-late aggregation-based retrieval requests (Aggre-gation)

Aggregation This query part handles aggregation-based postfiltering Users specify the IRI (IRIref )resource for which similarity scores of sublevelentities are summarized Additionally it is statedhow the data should be aggregated (ie SUMMAX or AVG) as well as which variable (Var)represents the aggregation-based recommenda-tions in the postfilter section The compiler makessure that this variable is actually contained in theRecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refersto the list of generated suggestions It definesthe recommendation variable to which all simi-lar LOD resources are mapped In case pre- andpostfilter conditions are set in the other sectionsthe compiler checks whether the variable (Var)appears in these sections as well Otherwise therequired join operations cannot be executed Itis specified how many recommendations shouldbe displayed (Integer) and whether the requestwill be issued against a default repository or as across-repository request (ServiceIntegration)

ServiceIntegration This section indicates that a userintends to receive recommendations from a dif-ferent SPARQL endpoint than the default end-point that is stated in the standard configurationof the SKOSRec engine It specifies the targetSPARQL endpoint (IRIref ) from which a user

L Wenige et al Similarity-based Graph Queries 29

wants to receive recommendations Upon extrac-tion of SKOS annotations from the default sourceendpoint a subsequent request is sent to the spec-ified target endpoint

ItemPart This section of a SKOSRec query repre-sents a single preference in the user profile How-ever recommendation queries can contain a cou-ple of preference statements A preference is ei-ther expressed as a LOD resource (IRIref ) or asa variable which is bound to a preference query(VarPart) In the latter case the resources areobtained by matching the graph pattern in theVarPart with the triple statements in the reposi-tory The ItemPart statement can be additionallysupplemented with a concept-to-concept similar-ity threshold (Sim) that triggers concept expan-sion with the approach of flexible similarity de-tection which is extensively described in one ofour previous works [64]

VarPart The construct initiates preference queryingIndividual likings are expressed as a graph pat-tern in a RecWhereClause The variable (Var) isa placeholder for the LOD resources that will beused to set up the user profile The compiler hasto make sure that the specified variable occurs inthe RecWhereClause as well Otherwise the ex-traction of LOD resources cannot be carried outcorrectly

Sim The Sim part of a SKOSRec query denotes thethreshold of concept-to-concept scores for flexi-ble similarity detection Even though knowledge-based similarity values usually range between 0and 1 the specification allows a broader range ofdigits in the Decimal datatype to enable applica-tion of other similarity metrics in case they areneeded

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users canstate whether scores should be ldquolargerrdquo ldquolargerthanrdquo or ldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three dif-ferent sections of a SKOSRec query ie either inthe prefilter (ItemPart) the postfilter (SelectPart)or the preference filter section (VarPart) TheRecWhereClause closely resembles the ldquoWhere-Clauserdquo of the SPARQL 11 specification withall its subsequent parts [21] Hence it enablesdifferent combinations of graph pattern match-ing However the RecWhereClause of the SKOS-Rec language allows fewer pattern matching ex-pressions than the ldquoWhereClauserdquo of the regular

SPARQL syntax specification [21] For instanceit is neither permitted to formulate subqueries norto direct filter requests to more than one reposi-tory (see RecGraphPatternNotTriples for furtherexplanations) This modification has been madeto simplify similarity calculation After parsingthis section the compiler makes sure that recom-mendation preference or postfilter variables oc-cur at least once in the RecWhereClause to guar-antee that subsequent steps of the recommenda-tion workflow can be applied on existing LOD re-sources In case the RecWhereClause comprises aRecMinusGraphPattern it also has to be checkedthat the variable is not only contained in theMINUS part of the group

RecGroupGraphPattern A RecGroupGraphPatterncan contain one to many basic graph patternswhich can be differently combined according toRecGroupGraphPatternSub

RecGroupGraphPatternSub This section defines thegraph patterns that are applied to identify suit-able LOD resources User filters can be either ex-pressed as basic graph patterns (TriplesBlock) oras combinations of them (RecGraphPatternNot-Triples) The sequence of different kinds of pat-terns can be flexibly specified

RecGraphPatternNotTriples This query part closelyresembles the similar named ldquoGraphPatternNot-Triplesrdquo of the SPARQL 11 specification [21]The only difference to SPARQL is that the Rec-GraphPatternNotTriples section is a little morerestricted in terms of admissible graph patternsthan the ldquoGraphPatternNotTriplesrdquo section of theregular SPARQL specification For instance itdoes not allow to retrieve LOD resources otherthan from the default graph that is specified in theconfiguration of the SKOSRec engine since filterpatterns have to be applied to datasets that containSKOS annotations Hence they need to be speci-fied before runtime execution to ensure that sim-ilarity calculation can be carried out correctly Ifa user was able to formulate filter conditions thatreferred to different RDF graphs and endpointsaccordingly this would not be possible

RecGroupOrUnionGraphPattern This query sec-tion marks an alternative graph pattern in the styleof the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntaxspecification [21] However it neither allowsSPARQL-like subqueries nor federated queries

30 L Wenige et al Similarity-based Graph Queries

for filtered resources because this would compli-cate similarity calculation

RecOptionalGraphPattern In this part of a SKOS-RecQuery users are enabled to specify additionalgraph patterns which might extend the query so-lution but do not necessarily have to match thedata

RecMinusGraphPattern This section handles exclu-sion of LOD resources for cases when users wantto omit certain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

L Wenige et al Similarity-based Graph Queries 31

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM A

mobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige J Ruhland Flexible on-the-fly recommendationsfrom Linked Open Data repositories In International Confer-ence on Business Information Systems pp 43ndash54 2016

[63] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[64] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[65] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[66] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[67] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Future Work amp Conclusion
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 6: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge … · Semantic Web 0 (0) 1 1 IOS Press Similarity-based Knowledge Graph Queries for Recommendation Retrieval Lisa Wenige*,

6 L Wenige et al Similarity-based Graph Queries

Definition 1 (Annotation property) An annotationproperty is an IRI (Listing 3) that is defined in theDublin Core Metadata Initiative (DCMI) specificationfor subject annotations It can occur in conjunctionwith one of the two namespaces which are stated bythe standard [15]

Listing 3 Annotation property

ltANNOTPROPgt = dcsubject | dctsubject

These properties help to retrieve SKOS annotationtriples for items preferred by a user (Definition 2)

Definition 2 (SKOS annotation triple) A SKOS anno-tation triple (st) is a triple that is comprised of an IRIin the subject position an annotation property (AN-NOTPROP) as predicate and a concept c in the objectposition The concept is part of a knowledge organiza-tion system S in SKOS format (C = c | c isin S ) (Eq1)

st isin I times ltANNOTPROPgttimesC (1)

The identification of SKOS annotation triples facil-itates a fast retrieval of potential similar items There-fore the engine evaluates shared features between re-sources from the SKOS annotation dataset (Definition3)

Definition 3 (SKOS annotation dataset) A SKOS an-notation dataset (AD) is a subset of an RDF datasetthat contains SKOS annotation triples (Eq 2)

AD sub I times ltANNOTPROPgttimes C (2)

Before the similarity calculation process starts theengine determines SKOS annotations for each LODresource (r) from the user profile by issuing a sub-sequent SPARQL query to the LOD repository fromwhich the user intends to receive suggestions The no-tion of SKOS annotations is specified in Definition 4

Definition 4 (SKOS annotations) In the annotationdataset LOD resources directly link to concepts ofa SKOS system via a predefined annotation propertySKOS annotations for an input resource r are definedas in Equation 3

Annot(r) = c isin S | exist(rltANNOTPROPgt c) isin AD (3)

Afterwards the engine identifies which of the otherresources in the dataset shares annotations with itSimilarities only have to be computed for items withmatching concepts [13 52] Therefore triples canbe efficiently joined along item features since nativeRDF storage systems usually index triples rather thancolumns [41] Thus the quick identification of mu-tual annotations through SPARQL requests facilitatesad-hoc similarity calculation Definition 5 specifies thenotion of relevant resources and their annotations thatare needed to identify similar items The applied syn-tax follows Peacuterez et al [42]

Definition 5 (Relevant resources and their annota-tions) The conjunctive graph pattern Pr matches allitems and subjects that are potentially relevant for rec-ommendation retrieval (Eq 4)

Pr = (rltANNOTPROPgt c) AND (qltANNOTPROPgt c)

(4)

The mapping Ωr of relevant resources and theirSKOS annotations is obtained by retrieving all re-sources q that share at least one SKOS concept c withresource r from AD (Eq 5) The corresponding resultset contains mappings for all variables (ie c and q)in the triple statements of Pr

Ωr = [[Pr]]AD (5)

Upon extraction of relevant resources and annota-tions the process of similarity calculation can startThe engine determines the information content (IC) ofthe shared SKOS annotations of two resources Thisapproach is based on the hybrid similarity metric pro-posed by Meymandpour and Davis [37] for LOD-enabled RS However while their measure considersall types of IRI resources for the retrieval our similar-ity metric purely relies on SKOS annotations (Defini-tion 6)

Definition 6 (SKOS similarity) Let Annot(r) be theset of SKOS features of resource r and Annot(q) the setof SKOS features of resource q and q isin micro(q) | micro isinΩr then their similarity can be derived from the IC oftheir shared concepts Cshared = Annot(r) cap Annot(q)

sim(r q) = IC(Cshared) (6)

L Wenige et al Similarity-based Graph Queries 7

The IC of a set of SKOS concepts is measured bythe sum of the inverse logarithms of each conceptrsquos fre-quency in relation to the maximum frequency of oc-currences among relevant resources (Definition 7)

Definition 7 (SKOS Information Content) The SKOSIC is defined as an aggregation of the individual ICvalues of each concept that is contained in the setof shared features (c isin Cshared) where f req(c) isthe frequency of occurrences of c among all rele-vant resources and n is the maximum frequency of oc-currences among these resources The final similarityscore of two resources r and q is determined by sum-mating the IC values of each concept of the shared fea-ture set

IC(Cshared) = minussum

cisinCshared

log(

f req(c)

n

)(7)

After having obtained resource annotations as wellas IC values similarity scores between the input re-source r and each potentially relevant resource q canbe calculated When a user profile contains more thana single item similarity values are aggregated throughsummation to determine the final recommendationscore (Definition 8)

Definition 8 (Recommendation score) The recom-mendation score of a potentially relevant resource q isquantified by the sum of similarities with each resourcer that can be found in the profile (Pr)

score(Pr q) =sumrisinPr

sim(r q) (8)

Upon calculation of similarity values for each po-tentially relevant item q the engine ranks the retrievedresources based on a given limit (k) that is specified bythe user in the SKOSRec queryWith the presented methods of on-the-fly retrieval rec-ommendations can be generated from LOD reposito-ries through an API interface without requiring lo-cal metadata or excessive preprocessing operationsother than the mapping of profile items with LOD re-sources4

4Di Noia et al describe how a mapping procedure can beconducted for a real-world implementation For their LDRS theymatched movie titles with the corresponding LOD resources in DB-pedia The authors extracted IRIs labels and publication dates forall movie resources They applied the Levenshtein distance on theseresources to identify matching movies items [13]

33 Preference Querying

In the previous subsection it was assumed that usersexpress their tastes in the form of preference state-ments for a certain item in the form of an IRI refer-ring to a LOD resource However it may also be thecase that likings are only vaguely known for instancewhen users prefer products with specific features butare not able or willing to specify the actual itemsThe engine obtains such a profile by issuing a sub-sequent SPARQL query that contains the preferencegraph pattern (Ppre f ) in the WHERE condition Sup-pose a user is interested in Quentin Tarantino movies(dbrQuentin_Tarantino) but does not specify theprecise items she likes He could send a preferencequery to the SKOSRec engine The graph pattern inthe request would match all movies that are con-nected to dbrQuentin_Tarantino with the prop-erties dbodirector or dboproducer (see Listing7)

Listing 4 SKOSRec query to generate recommenda-tions based on a preference query in the movie domain(Q2)

PREFIX dbr lthttpdbpediaorgresourcegt PREFIX dbo lthttpdbpediaorgontologygt

RECOMMEND movie TOP 5PREF [ prefMovieWHERE prefMovie dbodirector dbrQuentin_Tarantino UNIONprefMovie dboproducer dbrQuentin_Tarantino ]

Table 35 shows the corresponding solution set tothe preference query containing films from DBpediathat are similar to Quentin Tarantino movies

Table 3Result set for Q2

moviedbrThe_Godfather_Part_IIdbrThe_Prestige_(film)dbrThe_Dark_KnightdbrPlanet_TerrordbrClerks

The notion of a queried profile is specified in Defi-nition 9

Definition 9 (Queried profile) A queried profile is auser profile that has been compiled by retrieving allLOD resources r that are part of an annotation graph

8 L Wenige et al Similarity-based Graph Queries

AD and are contained in the solution mappings result-ing from the evaluation of a specified graph pattern(Ppre f ) over a dataset D (Eq 9)

Pr = micro(r) | micro isin [[Ppre f ]]D r isin dom(micro) r isin AD

(9)

34 Prefiltering

To facilitate exploratory search in LOD reposito-ries users should also have the option to formulateconditions in their queries A possible way to enablefilters during retrieval is before similarity calculationThus recommendation lists can be adjusted to individ-ual needsDue to the expressiveness of RDF data filter condi-tions can not only be applied on attributes that are di-rectly connected to potentially relevant items but canalso be declared as a graph pattern By this means it ispossible to retrieve LOD resources that are both simi-lar to the user profile as well as fulfill an advanced con-ditionThe strengths of the approach will be illustrated with atravel recommendation request that was executed overDBpedia Suppose a customer expressed a preferencefor the Lake Baikal region (eg as indicated by a pos-itive rating on a travel community site) With simpleon-the-fly retrieval she would receive a recommen-dation list that primarily consists of travel destinationsthat are located in the proximity to this geographicalarea A graph-based filter on the other hand facili-tates advanced retrieval options In the given exampleit enables the user to obtain suggestions for SoutheastAsian travel destinations that have similar features asthe Lake Baikal region even though the resource dbrLake_Baikal is not directly connected to any desti-nation with the annotation dbcSoutheast_Asia butcan only be identified by exploring the surroundinggraph structure of the filter attributeThe advanced constraint ensures that the generatedresult set is not empty A simple attribute-level fil-ter which is commonly utilized in faceted search sys-tems would not generate any search hits By thesemeans graph-based filter patterns can help users tobetter explore knowledge graphs thereby overcomingdata quality issues in LOD repositories The corre-sponding SKOSRec query to this example is shown inListing 5

Listing 5 SKOSRec query with an expressive pre-filter condition to retrieve Southeast Asian destinationsbased on a preference for the Lake Baikal region (Q3)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbc lthttpdbpediaorgresourceCategorygtPREFIX dbo lthttpdbpediaorgontologygtPREFIX dct lthttppurlorgdctermsgtPREFIX skos lthttpwwww3org200402skoscoregt

RECOMMEND destination TOP 5PREF dbrLake_Baikal[ WHERE destination dbocountry country

country dctsubject countrySubjectcountrySubject skosbroader dbcSoutheast_Asiadestination rdftype dboPlace

]

Table 34 shows the solution to this query It con-tains travel destinations and regions of interest thatare both similar to the previously liked destinationdbrLake_Baikal and are located in Southeast AsiaEcoregions around lakes rivers or mountains fromSoutheast Asian countries such as Cambodia Vietnamor Malaysia are presented to the user

Table 4Result set of Q3

destinationdbrTonle_SapdbrMekongdbrMount_KinabaludbrLaguna_de_BaydbrLake_Toba

The technical details for this type of query are givenin the following definitions Before the extraction ofrelevant resources and annotations the recommenda-tion engine identifies the resources that satisfy the fil-ter (Definition 10)

Definition 10 (Prefiltered resources) The mapping offiltered resources Ωpre f ilter is obtained by matching aspecified graph pattern (P) to the RDF graph of adataset D (Eq 10)

Ωpre f ilter = micro|q | micro isin [[P]]D (10)

After preselection resources are brought togetherwith user profile data to retrieve the final set of recom-mendations (Definition 11)

Definition 11 (Annotations and relevant resources(prefiltered retrieval)) The solution mappings of an-notations and relevant resources in prefiltered retrievalmode is obtained by joining the set of prefiltered re-

L Wenige et al Similarity-based Graph Queries 9

sources with the mappings Ωr of all relevant resourcesand annotations as determined from the user profile

Ωpre = Ωpre f ilter Ωr (11)

After retrieving annotations and relevant resourcesin prefiltering mode the system generates constraint-based recommendations In this context the rankingprocedure is adapted to the restricted set of relevantresources and annotations Hence IC values of match-ing annotations and respective SKOS similarities arequantified according to the user filter By this meansrecommendation scores reflect the similarity of itemsfor the actual set of filtered LOD resources Except forthis restriction the system carries out the calculations(ie determination of similarity values the ranking ofLOD resources) in the same way as in non-filteringmode

35 Postfiltering amp Combinations

The SKOSRec engine cannot only combine simi-larity calculation and graph pattern matching to pre-filter relevant LOD resources it can also apply userconstraints after similar items have already been de-termined By this means recommendations are filteredex-post to the similarity detection process This fea-ture enables novel retrieval patterns in which sugges-tions are part of a subquery As an example scenariofor a postfilter recommendation task consider the fol-lowing SKOSRec query which obtains suggestions formovies and directors based on a userrsquos preference forthe director dbrQuentin_Tarantino (see Listing6) For the given request the engine would identifysimilar directors based on Quentin Tarantinorsquos char-acteristic features (eg geographic origin or awardswon) and would determine the movies these similar di-rectors have shot Thus users may receive interestingmovie recommendations

Table 5 shows the solution the SKOSRec system re-turns when Q4 is issued against DBpedia The ellipsesin the displayed tables indicate records that have beenexcluded for brevity reasons

However this approach can still be improved uponFor instance the engine only considers the metadatadescriptions of the director whereas related movieshave to be retrieved anew upon execution of the post-filter section Thus they are not ranked according tosimilarity in the final output tableAnother weakness is the biased semantics of the query

Listing 6 SKOSRec query to generate simple post-filtered recommendations in the movie domain (Q4)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX yago lthttpdbpediaorgclassyagogt

SELECT director movieWHERE movie dbodirector director director rdftype yagoDirector110014939 LIMIT 50RECOMMEND director TOP 5PREF dbrQuentin_Tarantino

Table 5Result set of Q4

director moviedbrKirk_Douglas dbrScalawag_(film)dbrKirk_Douglas dbrPosse_(1975_film)dbrKevin_Costner dbrThe_Postman_(film)dbrKevin_Costner dbrDances_with_Wolves dbrGene_Kelly dbrHello_Dolly_(film)dbrGene_Kelly dbrInvitation_to_the_Dance_(film) dbrAlan_Alda dbrGoodbye_Farewell_and_AmendbrAlan_Alda dbrMargaretrsquo_s_Engagement dbrSteve_Buscemi dbrInterview_(2007_film)dbrSteve_Buscemi dbrAnimal_Factory

Usually when a person favors a movie director thispreference refers to the movies this director has shotand not to the personal features of the person Two di-rectors might be similar according to certain featuressuch as birth year nationality or received awards butthe style and genre of the movies they have created canstill be different A third factor is that recommenda-tion retrieval for movie directors should also take intoaccount that a film artist is likely more relevant themore movies she has shot that are in line with the pre-ferred movies of the user Hence the SKOSRec engineenables another postfiltering strategy which is calledaggregation-based retrieval This strategy can be help-ful whenever an entity type is linked to a couple ofLOD resources in a dataset such that similarity scoresof related entities (eg movies) can be aggregated toan overall recommendation score for a certain item(eg for a movie director)Aggregation-based requests can be well combinedwith other retrieval patterns such as preference queries

10 L Wenige et al Similarity-based Graph Queries

In the case of the movie recommendation scenario auser could state that she enjoyed Quentin Tarantinomovies in the past and would like to receive sugges-tions for directors that shot similar film Usually thiskind of question can typically only be answered byanother human being eg in a personal interactionamong friends or in an online forum The fact that thelanguage facilitates these kinds of queries is one of itskey advantages

Listing 7 SKOSRec query to generate post-filteredrecommendations based on a preference query in themovie domain (Q5)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX yago lthttpdbpediaorgclassyagogt

SELECT director movieWHERE movie dbodirector director director rdftype yagoDirector110014939 LIMIT 50AGG dbrQuentin_Tarantino director SUMRECOMMEND movie TOP 100PREF [ prefMovieWHERE prefMovie dbodirector dbrQuentin_Tarantino ]

Table 6 depicts the results of Q5 as retrieved fromDBpedia When comparing the solutions sets for Q4and Q5 it becomes clear that they are totally dif-ferent While Q5 generated a list of directors whohave mostly created action and thriller movies with atwist (ie Tarantinorsquos unique filmmaking style5) Q4provided a completely different output It shows USAmerican actor-directors whose movies range frommusical films (dbrHello_Dolly_(film)) towestern movies (dbrDances_with_Wolves) ordrama films (dbrInterview_(2007_film))Thus the directors in the solution table of Q4 (Table5) seem rather arbitrary and not exceptionally helpfulconcerning the initial user request

We define this kind of aggregation-based request asa so-called rollup query This definition was chosen be-cause the final suggestions are based on summarized(ie rolled up) preferences for sublevel entities An-other example for a roll-up request stems from the ap-

5httpsenwikipediaorgwikiQuentin_Tarantino

Table 6Result set of Q5

director moviedbrSteven_Spielberg dbrIndiana_JonesdbrSteven_Spielberg dbrJurrasic_Park_(film) dbrFrancis_Ford_Coppola dbrThe_Godfather_Part_IIdbrFrancis_Ford_Coppola dbrThe_Godfather_Part_III dbrChristopher_Nolan dbrThe_Prestige_(film)dbrChristopher_Nolan dbrThe_Dark_Knight dbrRobert_Rodriguez dbrPlanet_TerrordbrRobert_Rodriguez dbrMachete_(film) dbrKevin_Smith dbrClerksdbrKevin_Smith dbrClerks_II

plication scenario of travel search This recommenda-tion query derives suggestions for city trip destinationsfrom the points of interest (POI) a user has visited andliked in another city (eg London) Listing 8 showsthe corresponding SKOSRec query for this scenario

Listing 8 SKOSRec query to generate post-filteredrecommendations (aggregation-based) for a preferenceprofile containing the city of London (Q6)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX dullthttpwwwontologydesignpatternsorggtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX rdfslthttpwwww3org200001rdf-schemagt

SELECT cityWHERE poi property city city rdftype dboCity property rdfssubPropertyOf dulhasLocation LIMIT 3AGG dbrLondon city SUMRECOMMEND poi TOP 100PREF dbrOxford_StreetdbrNotting_HilldbrSouth_Bank

The SKOSRec request contains subquery com-mands that refer to the userrsquos preferred POIs It spec-ifies how similarity scores of the generated sugges-tions should be aggregated (ie sum-based) The re-trieval is based on the 100 POIs that are most simi-lar to the ones in the preference section Additionallythe subquery part refers to the DBpedia resource thatrepresents London (AGG dbrLondon city) thusindicating that the list of recommendations should notcontain any POIs in this city Upon processing the de-

L Wenige et al Similarity-based Graph Queries 11

picted query over the DBpedia repository the enginedisplays the three cities for whom the highest aggre-gated scores have been detected (see Listing 7)

Table 7Result set of Q6

citydbrOxforddbrAbingdon-on-ThamesdbrWallingford_Oxford

The motivation for formulating such a query is thatwhen searching for cities preferences might be betterrepresented through the sights a user has visited else-where rather than through a simple on-the-fly requestsolely containing a preference statement for a city inthe user profile sectionBesides the above presented roll-up retrieval patternsthe SKOSRec grammar may facilitate the formula-tion of additional types of advanced recommendationqueries For instance cross-domain queries are an in-teresting additional pattern to be explored for personal-ized retrieval Listing 9 shows an example request thatgenerates movie suggestions based on the preferencefor a music band (ie dbrThe_Jackson_5)

Listing 9 SKOSRec query to generate cross-domainrecommendations (Q7)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgt

SELECT movie participated musicalActWHERE movie rdftype dboFilm VALUES dbobasedOn dbomusicComposerdbogenre dbostarring musicalAct participated movie UNION movie participated musicalAct LIMIT 10AGG dbrThe_Jackson_5 movie MAXRECOMMEND musicalAct TOP 100PREF dbrThe_Jackson_5WHERE VALUES actType dboMusicalArtist dboBand musicalAct rdftype actType

The query contains a prefilter condition that trig-gers retrieval of music artists and bands Afterwardsthe scores of the top 100 suggestions are summarizedwith maximum-based aggregation Thus the highestsimilarity value among the set of music acts deter-

Table 8Result set of Q7

movie musicalActdbrMoonwalker dbrMichael_Jackson dbrThe_Wiz_(film) dbrAshford_amp_SimpsondbrGarfield_Gets_a_Life dbrThe_TemptationsdbrPipe_Dreams_(1976_films) dbrGladys_Knight_amp_the_PipsdbrThe_Jacksons dbrJermanine_Jackson

mines the ranking score of the film Also note be-cause of the aggregation-based retrieval procedureconnections between movies and the LOD resourcedbrThe_Jackson_5 are excluded from similaritycalculation The scenario exemplifies how semanticrelationships can help to generate suggestions for a tar-get domain (ie movies) that are based on a profilecontaining items from a different source domain (iemusic acts) While regular SPARQL queries performexact matching of triple statements the SKOSRec syn-tax facilitates the integration of similarity- and graph-based retrieval to enrich result lists with other poten-tially relevant itemsAs a point of reference consider the SPARQL queryfor the given cross-domain scenario (Listing 10)

Listing 10 SPARQL query to generate cross-domainrecommendations (Q8)

PREFIX dbr lthttpdbpediaorgresourcegt PREFIX dbo lthttpdbpediaorgontologygt PREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgt

SELECT movie musicalActWHERE movie rdftype dboFilm VALUES dbobasedOn dbomusicComposerdbogenre dbostarring dbrThe_Jackson_5 participated movie UNION movie participated dbrThe_Jackson_5 LIMIT 5

The query omits the similarity calculation partThus it only retrieves movies linked to the resourcedbrThe_Jackson_5 By this means the SPARQLresults have a better chance of being related to the ini-tial user profile On the other hand empty result listscan occur when the preferred LOD resource does nothave any direct connections to recommendable itemsThis assumption is backed up by the result lists of thetwo queries (Tables 8 and 9) The SKOSRec query pro-duces a more comprehensive solution than a regularSPARQL query

12 L Wenige et al Similarity-based Graph Queries

Table 9Result set of Q8

movie musicalActdbrThe_Jacksons dbrThe_Jackson_5

The following of definitions will formalize the post-filtered and aggregation-based retrieval forms Defini-tion 12 specifies the notion of SKOSRec recommenda-tions that may potentially undergo postprocessing

Definition 12 (SKOSRec recommendations) SKOS-Rec recommendations (R(Pr)) are the set of sugges-tions generated by processing the solution mappings(Ω) obtained from recommendation retrieval Theycan be a result of simple on-the-fly retrieval prefilter-ing or a result of a combination of these techniques Prrepresents the input profile that contains at least onepreference for an item The cardinality of R(Pr) is theintended number of recommendations (k) that is spec-ified by the user based on which the engine selects theresources with the highest recommendation scores thatare not contained in the profile (Eq 12)

R(Pr) = x | x isin q | score(Pr q) gt kmaxqisinΩ

score(Pr q) q isin Pr

(12)

The approach of subquerying with recommendationresults builds on the idea that SKOSRec suggestionsR(Pr) can be joined with any graph pattern Ppost pastto the process of similarity calculation (Definition 13)It does not make a difference how the engine obtainedthese recommendations

Definition 13 (Postfiltered recommendation mapping)The postfiltered recommendation mapping is obtainedby joining solution mappings generated from SKOS-Rec recommendations with the results of matching apostfilter graph pattern (Ωpost) (Eqs 13 and 14)

Ωpost f ilter = micro | micro isin [[Ppost]]D x isin dom(micro)

(13)

Ωpost = micro|x | x isin R(Pr) Ωpost f ilter (14)

As the prefilter Ppost can be comprised of triplestatements that are eligible in the RecGroupGraph-

Pattern part of the SKOSRec grammar Aggregation-based queries have to be treated differently than a reg-ular postfilter request Apart from excluding sublevelitems of the profile from the set of similar resourcesthe engine should also omit any connections betweenthe preferred superordinate item (a) and similar sub-level entities Additionally the user has to specify thevariable in his profile (eg y) that represents the itemswithin the postfilter graph pattern Ppost By this meansaggregation-based recommendations can be generated(Definition 14)

Definition 14 (Resources for aggregation-based re-trieval) The set of resources for aggregation-based re-trieval is obtained by retrieving LOD resources thathave a connection to a previously generated sugges-tion thereby excluding all resources that are linked tothe specified superordinate item a in the profile (Eq15)

Ragg(a) = micro(y) | micro isin Ωpost y isin dom(micro) a isin micro(y) (15)

For each potential aggregation-based suggestion (y)similarity scores of connected entities (x) have to beconsidered for the final ranking In this context differ-ent approaches to aggregation can be applied In caseswhen both similar items and aggregation-based recom-mendations are entities on comparable levels of gran-ularity then choosing the maximum similarity scoremight be the best way to represent the relatedness ofthe resource to the user profile On the other handwhen postfiltered recommendations are based on sim-ilarity scores of their sublevel entities (eg as in themovie recommendation example) the scores of relatedsuggestions should be aggregated by the sum or theaverage of individual scores

Definition 15 (Aggregation-based ranking score) Theranking score of an LOD resource y that is connectedto a set of recommended items is determined by aggre-gating scores of connected entities either by the max-imum (Eq 16) the sum (Eq 17) or the average (Eq18) of individual scores

scoreaggMAX(y) = maxxisinmicro(xy)

score(Pr x) (16)

scoreaggS UM(y) =sum

xisinmicro(xy)

score(Pr x) (17)

L Wenige et al Similarity-based Graph Queries 13

scoreaggAVG(y) =

sumxisinmicro(xy) score(Pr x)

|x isin micro(x y)|(18)

4 Evaluation of the SKOS Recommender

41 Experimental Setup

The novel recommendation strategies of the SKOS-Rec engine were evaluated in the context of web-basedexperiments for the usage scenarios of travel destina-tion search and multimedia RS The online approachhelped to reach out to more participants with diversedemographic backgrounds which would not have beenpossible in a laboratory experiment Another positiveside-effect of the web-based setting was that partici-pants could test and rate recommendations without be-ing directly observed by an experimenter which mighthave caused biased results otherwise Hence the on-line approach softened common limitations of labora-tory studies [19]The experiments were carried out in the defined usagescenarios with metadata descriptions from DBpediaThe database containing the local mirror of DBpediaran on a virtual server Besides the triple store a webapplication was hosted on the same machine The web-site ran on an Apache Tomcat 7 application server andwas implemented with the help of Java servlet technol-ogy67

We conducted the online experiments between April2016 and January 2017 Upon setting up the web ap-plication a pretest was carried out to control for po-tential pitfalls such as misleading navigation Two testusers ran step by step through the web interface andprovided their feedback Because of their insights weslightly modified the interface (eg through rephras-ing user instructions) to increase the chances of suc-cessful completion In case the SKOSRec engine is ap-plied in a live setting the design of the user interfacewill have to be tested with more users Our experimen-tal series focused on algorithmic performance whichwas tested with numerous participantsSubjects were recruited through the clickworkercomplatform8 Participants received a compensation fee

6httptomcatapacheorg7httpwwworaclecomtechnetworkjavajavaeeservletindexhtml8httpswwwclickworkerde

of 100e or 150e for a completed survey 103 par-ticipants took part in the study on travel destinationsearch In the multimedia domains in total 154 sub-jects evaluated the SKOSRec engine For each partof the evaluation it was ensured that participants hadnot just run through the evaluation screen Wheneverthere appeared to be irregularities in the log files (egthe default settings of the screen were not changed)these assessments were excluded from further analy-sis The survey template started with a consent formIt informed participants about the context and the pro-cedure of the study After having filled in the consentform participants were navigated to the first part of theexperiment It collected data on subjectsrsquo demograph-ics and their consumption and search habits After thepreparation phase participants worked on the first testcase (TC1) in each usage scenario In this test case theperformance of the SKOSRec engine was assessed inthe simple execution mode of on-the-fly recommenda-tions (see Subsect 32) It was done to gather data onthe usefulness of suggestions resulting from a simplecontent-based approach (ie baseline method) withwhich more advanced recommendation strategies werecompared at a later stage In cases where users did notstate any items the application showed an error pageWhile setting up a profile required some effort on theside of the user it was a necessary step for the experi-ment Since we did not have access to the log files of alive application profile generation had to be simulatedThe process was facilitated by a specifically tailoredinterface (see Fig 2) where subjects could search foritems through an autocomplete field that applied AJAXtechnology9

It enabled users to type in keywords (eg the nameof a music act or the author of a book) for whichmatchings were instantaneously displayed withoutreloading the site Hence participants could quicklyfind and select their items of interest Whenever auser entered a query through the AJAX interfacethe system matched the query keywords against anApache SolrLucene search index10 The index wasbuilt with item metadata which had been extractedfrom DBpedia prior to setting up the website Theindex comprised the names and abstracts of itemsWhenever available information on thumbnail pic-tures were saved in the index to be ready for display inthe AJAX interface In case the Apache SolrLucene

9httpapijquerycomjqueryajax10httpluceneapacheorgsolr

14 L Wenige et al Similarity-based Graph Queries

Fig 2 User profile generation TC1 (music domain)

search engine found index keywords that matched theuser query metadata information for these items wereimmediately shown on the screen This informationwas displayed to help participants make an informeddecision on whether the AJAX result list containedsuitable items for profile generationAside from the AJAX feature the appeal of the in-terface was increased by a progress bar at the top ofthe webpage (see Fig 2) Progress bars are commonlyutilized tools for web surveys In conventional paper-based studies subjects usually see how far ahead theyare while in a web survey they cannot check theirprogress immediately This is especially true for stud-ies that apply a screen-by-screen navigation whichwas the chosen approach for the conducted web ex-periments Progress bars prevent users from tiring ofthe questionnaire and withdrawing from the study alto-gether by showing how close participants are to com-plete the experiment [14]Upon profile generation the SKOSRec engine calcu-lated on-the-fly recommendations for a simple query(similar to Q1 Sect 3) which were shown to usersin a separate screen Suggestions were displayed witha sufficient amount of information eg thumbnailslabels or a short description to help participants as-sess the quality of the recommendation In addition tothis information users could also access the respectiveWikipedia article by following the hyperlink on the la-bel Relevance sliders were used to assess the utility ofeach recommendation They are often applied in eval-uations of IR systems [51 54] Scores were handled aspoints on a 0 to 100 scale of the relevance slider (outerleft side lowest relevance outer right side highest rel-evance see Fig 3)

This approach facilitated the calculation of an accu-rate precision score and the application of tests withhigher statistical power than an ordinal 5-star scaleThe score called mean relevance (mrs) was measuredwith Equation 19

mrs =

ksumi=1

scorei

k(19)

In addition to the precision score the web applica-tion also determined the size of the recommendationlist produced by the engine as an indicator for recallHowever recommendation list size can only be uti-lized for this purpose when mrs scores reach a reason-able levelWhile accurate recommendations are useful becausethey build trust in a system [57] they only capture anarrow aspect of performance [36] For instance anonline retailer can tremendously profit from an RSthat provides both relevant and unobvious recommen-dations since it enables users to explore the prod-uct catalog more thoroughly On the other hand inthe case of an RS only suggesting popular productsit is likely that the user already knows them [3 23]A pure accuracy-based evaluation disregards these as-pects which is why other performance indicators suchas novelty were measured as well Thus the evalua-tion interface also contained a section where subjectshad to tick a radio button that indicated whether anitem was new In the backend of the web applicationnovelty was measured as the ratio of relevant items not

L Wenige et al Similarity-based Graph Queries 15

Fig 3 Recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

familiar to the user (|nov|) among the total numberof relevant items (|tp|) (Eq 20)

nv =|nov||tp|

(20)

An item was considered relevant when it achievedan mrs score of 50 or higher In addition to novelty anddiversity the web application also tracked the diversityof recommendation lists since a topically diversifiedresult set has been found to increase overall user satis-faction [31 66] In the RS literature the most widelyexplored diversification metrics are the ones that mea-

sure the similarity between each item pair in the rec-ommendation list The more the items are alike theless diverse is the result set [19] The study series ofthis paper applied the metric by Castells et al [8] Itdetermines item-to-item similarity values with the co-sine similarity measure Scores are then converted todistance values (Eq 21) and averaged for the recom-mendation list (Eq 22)

d(il im) = 1minus cos(il im) (21)

16 L Wenige et al Similarity-based Graph Queries

divC =2

k(k minus 1)

sumlltm

d(il im) (22)

The content-based diversity scores (divC) were cal-culated in the backend of the web application andsaved in a log file for each generated recommenda-tion list of the SKOSRec engine during runtime execu-tion In addition to the measurement of divC scores weasked participants about their opinions on recommen-dation list diversity directly Following the RS evalua-tion framework by Pu et al [44] two Likert items wereposed for this purpose

ndash The recommendations of list [] are diversendash The recommendations of list [] are similar to

each other

Besides the above listed algorithmic performancemetrics the authors gathered assessments from usersregarding the overall usefulness of the suggestionsThis approach is also in line with the empirically veri-fied evaluation framework by Pu et al [44] which pro-poses to measure the psychometric construct of Per-ceived Usefulness to get a comprehensive understand-ing of user opinions on recommendation results Thusparticipants were instructed to state their agreement topositive statements regarding the relevance of the sug-gestions The statements were adapated to the speci-ficities of each usage scenario Fig 4 shows an exam-ple collection of Likert questions for the domain ofmusic RS

After participants had provided the required infor-mation their answers were saved in a log file and theywere navigated to test case 2 (TC2) which evaluatedthe SKOSRec enginersquos performance for constraint-based queries (similar to Q3 Sect 3) We tried to findout if the engine was able to improve user satisfactionthrough filter conditions Even though IR systems al-ready successfully apply facet filters on result sets theexperiments still had to prove whether LOD-enabledconstraints have the same effect on the Perceived Use-fulness of recommendations For the comparisons ofnon-filtered and filtered suggestions we applied awithin-subjects design in which participants had torate two recommendation lists resulting from regular(non-filtered) and filtered requests in TC2 We favoredthis setup over a between-subjects design where sub-jects would have been assigned to only one result listWhile the between-subjects design more closely re-sembles a real-world setting since it does not require

participants to interact with more than one approachat a time differences among subjects (eg regard-ing demographics or topic expertise) can potentiallyhave a considerable impact on quality judgments [19]Hence when applying a between-subjects design onedoes not know whether differences in performance arecaused by the approaches or by the variability amongparticipants Therefore researchers need to conductbetween-subjects experiments with a sufficient numberof study subjects to level out these differences Within-subjects studies on the other hand require fewer par-ticipants while still achieving the same level of sta-tistical power since between-subject variability is notan issue [30 31] Beel et al have shown that evensmall variations in an RS test setting could cause sub-stantial differences in performance outcomes [6] Tokeep the extent of variations even lower comparisonsbetween non-filter and constraint-based recommenda-tions were based on the same profile for each user inTC2 Therefore the user profile from TC1 served as astarting point for filtered retrieval The web applicationshowed the recently generated profiles to users andasked them to provide an additional filter conditionFig 5 depicts an example web form for a constraint-based recommendation request in the music domain Itcomprises the user profile and an additional text fieldwhere users stated their constraint

Afterwards the engine applied this filter conditionon the set of potential recommendation results be-fore similarity calculation started For each usage sce-nario different types of constraints were available Re-garding suitable filter options the author sought toachieve a balance between assisting users in finding fil-ter conditions that would adequately represent their in-formation needs while keeping the variability amongusers as low as possible Therefore the specificitiesof each scenario and the availability of the respectivedata sources were determined For instance for themultimedia domains (movie music and books) it wasassumed that users would like to filter recommenda-tions according to the specific genre of the item Fortu-nately the DBpedia dataset contains this kind of infor-mation for the music domain In this domain the genrecan be retrieved through the properties dbogenreHowever in the movie and book domains DBpediadoes not provide sufficient information Many LOD re-sources which represent books were not assigned agenre In the movie domain a genre property is miss-ing entirely and genre-specific information can of-ten only be found in the subject categories describ-ing the movie Hence in contrast to the experiment on

L Wenige et al Similarity-based Graph Queries 17

Fig 4 Evaluation screen for recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

music recommendations participants in the domainsof books and films did not filter their recommenda-tion lists by genre On the other hand the propertydcsubject was applied as recommendation filter ineach multimedia domain Since the subject propertyoften links to many informative features of multimediaitems (eg release period geographic information orcontent information) the author decided that this fea-ture represents a powerful filter dimensionIn the experiment on travel destination search the webinterface offered two filters a subject and a location-specific filter The subject filter enabled users to statetheir preferences regarding the characteristics of a lo-cation (eg being a nature reserve) The location-specific filter on the other hand gave participants theoption to specify where the desired destination shouldbe located11

The second fundamental research issue of TC2 con-cerned the question whether expressive constraints canboost recommendation quality even further Hence

11httpdbpediaorg

apart from executing a constraint-based workflow witha simple filter (ie a direct attribute of an item)the SKOSRec engine generated suggestions with thehelp of an expressive filter as well This second pro-cedure while still applying the same user constraintas the simple filter utilized an expressive graph pat-tern to include additional LOD resources in the resultset For instance in case a subject filter was selectedresult set expansion was facilitated by exploring thewider semantic space of the DBpedia category graphthrough a property path declaration in the graph pat-tern (eg skosbroader[2]) In the travel exper-iment an additional expressive filter retrieved place-related information (dulhasLocation) for location-specific subproperty relations that are not material-ized in DBpedia Hence the expressive graph pat-tern expanded the filter such that additional proper-ties were matched (eg dbocountry dbocity ordbostate) while still applying the specified userconstraint on the set of LOD resources Upon exe-cution of constraint-based recommendation requestsusers received two recommendation lists (resultingfrom simple and expressive filtering accordingly) in a

18 L Wenige et al Similarity-based Graph Queries

Fig 5 Web form for a contraint-based recommendation request (music domain)

separate evaluation screen Participants evaluated re-sult sets in the same manner as in TC1 For each resultset the screen contained an additional Likert item ask-ing subjects to give their agreement to the statementldquoThe filter has improved the recommendation resultsof listrdquo The sequence of suggestions resulting fromthe different methods was randomly assigned in eachuser session to avoid order effectsAfter study subjects had completed TC2 they werenavigated to test case 3 (TC3) In the TC3 section ofthe travel experiment participants could choose be-tween three options for rollup query patterns (simi-lar to Q6 Sect 3) Users were able to obtain recom-mendations for travel destinations based on POIs theyhad visited during the stay in another destination Theycould either select a city (option 1) a region (option 2)or a country (option 3) as destination type Upon en-tity type selection the SKOSRec engine generated ap-propriate suggestions Fig 6 shows the web form fromthe travel experiment that facilitated the formulation ofadvanced rollup requests in TC3

When users had selected an entity type stated theirfavorite travel destination and three belonging POIs

the application generated a SKOSRec query from theseparameters A simple on-the-fly request was sent tothe engine as well to retrieve baseline recommenda-tions with which the suggestions resulting from theadvanced request were compared at a later stage ofthe experiment The simple query only contained theuserrsquos favorite travel destination and the selected en-tity type option thus omitting information on the POIsDuring the travel experiment the engine processedboth the simple and the advanced query and producedtwo recommendation lists As in the previous parts ofthe experiment participants assessed the quality of theresult sets which appeared in random order on thescreenThe multimedia experiments also contained a sec-tion on rollup retrieval (TC3) As the advanced travelqueries the multimedia requests aggregated similarityscores of sublevel entities through summation whichthe engine joined with a postfilter In addition to thesefeatures the pattern contained a preference query partThis section identified a set of preferred items througha single user statement Participants either specifiedtheir favorite directoractor (music domain) music act

L Wenige et al Similarity-based Graph Queries 19

Fig 6 Travel - User profile generation TC3

(music domain) or author (book domain) and receivedrecommendations based on the features of the workscreated by the artist (similar to Q5 Sect 3) TheSKOSRec engine generated recommendations basedon the creative works of the favored artist Partici-pants also received suggestions from a simple on-the-fly query which computed results according to theartistrsquos characteristics As in the travel experiment rec-ommendations from the advanced rollup request werecompared with the on-the-fly query without letting par-ticipants know which approach they were assessingand through random assignment of result set positionsAfter participants of the multimedia experiments hadevaluated the results of TC3 they were guided to testcase 4 (TC4) which evaluated cross-domain queriesThe experiment on travel RS ended with TC3 becauseno suitable graph patterns could be identified to fa-cilitate these kinds of suggestions for the travel usagescenario In the multimedia domain however the datafrom DBpedia was sufficient for this retrieval task Theentity type of the study (ie movie music act or book)defined the source domain based on which the enginegenerated suggestions for items from another multime-dia target domain (similar to Q7 Sect 3) Fig 7 de-picts the cross-domain web form of the music experi-ment

The web form assisted subjects in formulating a re-quest which then obtained movie suggestions basedon the favorite music act of the user Parameters fromusers were entered in the placeholders of the querypattern in the backend of the application We formu-lated the query templates in such a way that the queryengine matched any properties or graph patterns thatcould potentially connect two items from the specified

target and source domains For instance such match-ings can occur when a user states his favorite moviethat is written by a particular author In case the sameauthor has also written some books they might be ofinterest to the user as well Another example stemsfrom the movie domain Suppose in hisher profile aconsumer has declared a preference for a movie thatlinks to the corresponding soundtrack It might wellbe the case that the user likes the soundtrack and themusic acts that were involved in creating it just asmuch as she likes the movie The practice to distin-guish homonymous LOD resources through the prop-erty dbowikiPageDisambiguates is another in-teresting linking pattern from DBpedia that was ex-ploited for cross-domain requests in TC4 since it con-nects similar entities (ie the book edition of a certainmovie)Cross-domain SPARQL queries (similar to Q8 Sect3) were posed as baseline requests in this test caseWhile the SPARQL query only matched cross-domainrelations for a single LOD resource the SKOSRec re-quest expanded the search space by considering linksfrom more than one item As in the previous test casesTC4 ended with an evaluation screen where users as-sessed the SPARQL-based as well as the SKOSRecsuggestions in random order without knowing whichrecommendation list belonged to which method Table10 gives an overview of the test cases and the meth-ods that were applied in the experiments in each usagescenario

42 Results

In total 257 people took part in the studyThe gath-ered samples from the experiments contained answers

20 L Wenige et al Similarity-based Graph Queries

Fig 7 Music - User profile generation TC4 (page 9)

Table 10Test Cases in the web experiments

Test Case Evaluation Purpose Query Type Travel Movie Music BookTC1 On-the-fly Recommendations (Baseline) Q1 X X X X

TC2 Constraint-based Recommendations Q3 X X X X

TC3 Advanced Queries (Rollup) Q5 Q6 X X X X

TC4 Advanced Queries (Cross-Domain) Q7 x X X X

from both genders with a slight overrepresentation ofmale participants (537) The prevailing number ofstudy subjects was between 20 and 40 years of age(658) The evaluation of the demographics sectionalso revealed that the majority of participants alreadyperceives the search in their domain of interest as ei-ther ldquomanageablerdquo (385) or even as ldquoeasyrdquo or ldquoveryeasyrdquo (440)Prior to analyzing the performance results of the novelretrieval approaches of the SKOSRec engine it waschecked whether the respective Likert items for Per-ceived Usefulness and user-based diversity (divU)reached acceptable reliability levels For this purposeCronbachrsquos α values were determined Table 11 showsthe outcome of this analysis for the Perceived Useful-ness construct Given the acceptable values through-out the domains (Cronbachsrsquo s α gt 07) it could betreated as an interval-scaled variable in the statisticaltests

Table 11Cronbachrsquos α values for the concept of Perceived Usefulness

Domain Cronbach αTravel 07002

Movie 08579

Music 08515

Book 08949

Besides usefulness user-based diversity scores werealso measured by multiple Likert items However

Cronbachrsquos α values for this concept did not reach ac-ceptable levels In order to avoid that valuable user as-sessments remained unused agreements to the state-ment bdquoThe recommendations of list [] are diverse ldquowere taken into account as an ordinal variable (divU)Based on these results subsequent tests could be car-ried out The first part of the analysis concerned theperformance of the on-the-fly retrieval method (base-line) of the SKOSRec engine in TC1 The outcomeof this analysis confirms that the on-the-fly approachrepresents a viable recommendation strategy Table 12shows the mean (M)12 standard deviation (S D) andnumber of participants (N) for each performance di-mension and metric It can be seen that users assessedthe enginersquos performance in almost each quality di-mension to be above mediocrity Aside from the posi-tive user evaluations with regard to Perceived Uselful-ness related quality aspects such as Accuracy Noveltyand Diversity received good assessments as well Forinstance each domain mean relevance score (mrs) washigher than 50 score points It means that participantswere often convinced of the relevance of the resultsGiven these positive user assessments with regard toitem relevance it seems to be justified to consider themean of result list sizes as an indicator of recall Theengine was able to generate the required number ofrecommendations in almost each test case (see Table

12In case of the user-based diversity score (divU) M denotes themedian because of the ordinal scale of the variable

L Wenige et al Similarity-based Graph Queries 21

12) The on-the-fly retrieval approach was also testedwith regard to the Novelty dimension Table 12 showsthe mean novelty scores (nv) for the baseline recom-mendation approach Novelty scores indicate that themajority of items in a result list were deemed to be newin the travel and book RS whereas in the experimentson movie and music RS only one third of the recom-mendation list was unfamiliar to users In contrast tothat content-based Diversity scores (divC) were ratherhigh in each domain Hence it can be concluded thatitems in a result set usually have a low rate of match-ing SKOS annotations In addition to content-basedscores user assessments of recommendation list Di-versity (divU) were also taken into account The me-dian level of agreement to the Diversity-related Lik-ert item stating that recommendations are diverse liedat 3 (ldquoneutralrdquo) in the multimedia domains and at 4(ldquoagreerdquo) in the travel domainIn summary the baseline approach achieved fairlygood performance results in TC1 The findings demon-strate that LOD-enabled recommendations can po-tentially enhance a userrsquos search experience Theyalso allow for subsequent tests of advanced retrievaltechniques which can be compared to this standardmethod

In the analysis of performance results from TC2(constraint-based recommendations) particular atten-tion was paid to participantsrsquo agreements with thestatement ldquoThe filter has improved the recommenda-tion results of the listrdquo Fig 8 shows the distribution ofdomain-wise agreement ratios On average they werefairly high Throughout the domains the vast majorityof participants either selected 5 (ldquostrongly agreerdquo) 4(ldquoagreerdquo) or 3 (ldquoneutralrdquo) Hence a filter often had apositive impact on results

The authors gathered responses to the questionon improvement for recommendation lists resultingfrom both the simple and the expressive filtering ap-proaches Hence they give clues about the generalperformance of constraint-based retrieval Howeversince this section of the study revealed the experimen-tal condition to participants (ie application of a fil-ter) answers to the Likert statement have to be inter-preted cautiously Results may be slightly biased asparticipants were able to guess the underlying agendaHowever these limitations do not exist for compar-isons between simple and expressive filters Since theweb application randomly assigned the list order par-ticipants did not know which approach they were as-sessing This enables us to take a closer look at the dif-ferences between the two filtering approaches It was

Table 12Performance results in TC1 (M = Mean S D = Standard DeviationN = No of participants)

Dimension (Metric)[Max Score]

Domain Results

M SD N

Accuracy (size) [10]

Travel 1000 000 103Movie 1000 000 50Music 1000 000 53Book 980 140 51

Accuracy (mrs) [1]

Travel 5649 000 103Movie 6225 1915 50Music 5570 010 53Book 5747 1555 50

Novelty (nv) [1]

Travel 060 034 101Movie 036 031 50Music 036 039 52Book 061 029 47

Diversity (divU) [5]

Travel 4 - 102Movie 3 - 50Music 3 - 53Book 3 - 50

Diversity (divC) [1]

Travel 080 016 103Movie 087 012 50Music 086 010 50Book 092 008 50

Usefulness [5]

Travel 334 085 103Movie 355 073 50Music 318 085 53Book 343 081 50

hypothesized that advanced filters improve recommen-dation quality as they may identify more relevant re-sources thereby overcoming problems of data qualityor data sparsity in LOD repositories Table 13 lists re-sponse rates of the two methods The rate measuresthe ratio of non-zero result sets among all result setsThroughout the domains recommendation lists result-ing from expressive filtering achieved higher responserates

Table 13Response Rates (TC2)

Domain Constr-based(Regular)

Constr-based(Expanded)

N

Travel 23 57 103

Movie 86 88 50

Music 72 75 53

Book 73 76 51

22 L Wenige et al Similarity-based Graph Queries

000

025

050

075

100

Digital Library Travel Movie Music Book

Domain

Per

cent

age

of A

gree

men

t

Degree of Agreement

1

2

3

4

5

The filter has improved the recommendations of the list

Fig 8 Participantsrsquo agreement with the statement that the filter has improved the result list (domain-wise) (TC2)

In addition to response rates user assessments andscores in the defined quality dimensions were com-pared Table 14 depicts the outcome of this analysisSignificantly higher scores are marked in bold figuresIn case statistical tests identified no meaningful varia-tions the results of the better performing approach areunderlined A-levels were adjusted with the false dis-covery rate (FDR) in order to decrease the probabilityof making a type I error due to multiple testing

According to mean scores expressive requestsgenerated more comprehensive recommendation lists(size) This finding is in line with the increased re-sponse rates shown in Table 13 However subse-quently conducted t-tests confirmed significant dif-ferences only for the travel experiment (t(102) =minus789 p lt 0001) In contrast precision scores werehigher in most of the domains (travel movie mu-sic) when the SKOSRec engine processed a simpleconstraint-based query However statistical tests didnot confirm any systematic differences The authorsapplied Wilcoxon tests for mrs scores and the remain-ing performance metrics because of the small samplesizes resulting from the low response rates of the sim-ple constraint-based approachRegarding Novelty none of the approaches was supe-rior In the Diversity dimension expressive filters gen-erated suggestions that were at least as topically di-versified as the results from simple constraint-basedretrieval Even though no significant differences were

identified between the two approaches increased meanscores (divU in the travel domain and divC in each do-main) indicate a slight superiority of expanded filterrequests in this quality dimension The same appliesto usefulness scores (mean values were higher in 3 outof 4 domains when expressive filtering was applied)However these statements are speculative at best be-cause statistical tests were not significantIn summary it can be concluded that expressive fil-ters improve recall potentially leading to a slight lossin precision scores (see Table 14) It may also be thecase that expanded user constraints diversify as wellas increase the usefulness of recommendation listsHowever these claims are unverified In contrast itis safe to say that expressive query patterns generateresults of at least the same quality as simple patternswhile increasing the number of relevant suggestionsThe few differences may be explained by the high Jac-card scores (JI) (see Table 15) of recommendation listsThroughout the domains result sets were much alikeGiven these similarities and the increases in list sizesfor expressive filters it is supposed that an expandedconstraint produces an enhanced version of the recom-mendation list resulting from simple filtering There-fore expanded filters should be the default setting in aconstraint-based retrieval context

In TC3 (ie evaluation of roll-up query patterns)the SKOSRec engine almost always generated non-empty recommendation lists (see Table 16) In the

L Wenige et al Similarity-based Graph Queries 23

Table 14Performance results in TC2 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Simple Expressive N

M SD M SD

Accuracy (size) [10]

Travel 117 270 452 462 103Movie 672 408 720 392 50Music 596 464 636 451 53Book 459 450 500 451 51

Accuracy (mrs) [1]

Travel 6673 2051 6534 2236 23Movie 6095 2298 6029 2279 43Music 6377 1605 6296 1644 34Book 5617 1818 5681 1681 37

Novelty (nv) [1]

Travel 048 041 061 049 21Movie 044 036 055 080 42Music 055 039 052 037 38Book 077 034 076 033 33

Diversity (divU) [5]

Travel 3 - 35 - 24Movie 4 - 4 - 43Music 4 - 4 - 38Book 3 - 3 - 37

Diversity (divC) [1]

Travel 068 018 072 021 19Movie 070 024 073 023 41Music 086 013 087 012 34Book 084 024 095 063 29

Usefulness [5]

Travel 333 093 350 077 24Movie 332 085 331 086 43Music 355 081 364 078 38Book 339 083 362 121 37

Table 15Jaccard Indices (TC2)

Domain Mean Jaccard Index (JI) SD N

Travel 057 047 25

Movie 071 038 43

Music 082 035 37

Book 092 026 35

travel and the book experiment the response rate washigher for aggregation-based queries whereas in themovie domain the engine more often provided recom-mendations for regular requests However these dif-ferences are only marginal and do not indicate a clearsuperiority of one method

Fig 9 depicts participants general agreement to Lik-ert statements concerning the Perceived Usefulness ofthe approaches in TC3 The diagram shows similarresults for regular as well as advanced requests Onaverage satisfaction scores reached levels which layslightly above neutral agreement A subsequent t-test

Table 16Response Rates (TC3)

Domain Regular Aggr-based N

Travel 99 100 103

Movie 96 92 50

Music 100 100 53

Book 96 100 51

confirmed no significant differences between the twomethods In fact the approach with the highest aver-age score of Perceived Usefulness was different in eachexperiment While mean values were higher for regu-lar requests in the studies on travel and music RS ad-vanced queries produced higher scores in the movieand book experiments

Content-based Diversity (divC) was the only met-ric with systematic differences Here scores went upfor advanced retrieval patterns in the travel (t(98) =minus550 p lt 0001) the music (t(52) = minus451 p lt

0001) and the book domain (t(47) = minus490 p lt

24 L Wenige et al Similarity-based Graph Queries

0

1

2

3

Travel Movie Music Book

Domain

Agr

eem

ent

Method

Regular

Rollminusup

Perceived Usefulness (Regular vs Rollminusup Queries)

Fig 9 Participantrsquos overall satisfaction with regular and roll-up recommendation retrieval (TC3)

0001) (see Table 17) This finding is not surprisinggiven that the aggregation-based approach explores thewider semantic network of each item in the profile

Hence it can be assumed that both approaches pro-duce recommendation lists of similar quality and canprovide benefits to potential consumers Thus it can-not be clearly stated which approach is superior be-cause neither of them outperformed the other in eachdomain While the regular retrieval strategy was bet-ter in terms of usefulness in the travel and the mu-sic domains the aggregation-based approach achievedhigher scores in the domains of movie and book rec-ommendationsThis conclusion is especially impressive given thelow mean Jaccard scores for recommendation lists re-sulting from regular and aggregation-based retrieval(see Table 18) The mean score never exceeded 01

throughout the domains It indicates a low concor-dance between result sets It is remarkable that par-ticipants perceived the recommendations as equallygood given the high dissimilarity of the lists Thusadvanced queries have an added-value as they provideadditional interesting results They can help to exploreLOD repositories in different ways than a regular re-trieval approach Therefore it is worthwhile to offeraggregation-based query patterns as an alternative re-trieval strategy when the regular approach has not pro-duced any helpful recommendations

In addition to TC3 user assessments for TC4 (cross-domain recommendations) were also evaluated How-ever since the authors conducted experiments for TC4only in the multimedia domains the samples were con-siderably smaller Nevertheless it is assumed that theamount of completed user sessions is only just enoughto conduct further statistical analyses Table 19 showsthe response rates for the two retrieval methods Cross-domain patterns had a higher success rate of generat-ing non-zero result sets throughout the domains Par-ticipants almost always received a recommendation forthis query type whereas in case a regular SPARQLquery was executed often they did not receive a singlesuggestion at all

We applied non-parametric tests for the perfor-mance metrics mrs nv divU divC and PerceivedUsefulness due to the low response rates of theSPARQL-based approach which led to a low num-ber of comparable data points accordingly Overall theFDR-adjusted pairwise Wilcoxon tests detected no sig-nificant differences except for user-based Diversityscores (divU) in the movie (Z = minus269 p lt 005) andthe book domain (Z = minus256 p lt 005) The resultsfor the content-based Diversity (divC) metric seem topoint in the same direction because of the high meanvalues for cross-domain queries in each domain Butthe significance of these results was not verified Themrs scores indicate a better performance of regular

L Wenige et al Similarity-based Graph Queries 25

Table 17Performance results in TC3 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular Aggr-based N

M SD M SD

Accuracy (size) [10]

Travel 990 099 942 207 103Movie 288 059 276 082 50Music 1000 000 1000 000 53Book 288 059 300 000 51

Accuracy (mrs) [1]

Travel 4821 2498 4493 2559 102Movie 5458 2288 5669 2542 44Music 5395 1831 5824 1619 52Book 5153 2390 4925 2230 49

Novelty (nv) [1]

Travel 043 039 038 040 98Movie 035 038 028 041 41Music 049 036 046 039 50Book 061 042 062 044 47

Diversity (divU) [5]

Travel 35 - 3 - 100Movie 4 - 3 - 43Music 4 - 3 - 50Book 3 - 3 - 48

Diversity (divC) [1]

Travel 079 017 089 013 99Movie 075 020 066 027 44Music 083 017 093 007 53Book 074 027 093 006 48

Usefulness [5]

Travel 336 077 322 074 101Movie 320 092 326 094 44Music 345 084 344 085 51Book 295 087 303 096 48

Table 18Jaccard Indices (TC3)

Domain Mean Jaccard Index (JI) SD N

Travel 007 011 101

Movie 003 007 44

Music 002 005 53

Book 002 006 50

Table 19Response rates (TC4)

Domain Regular (SPARQL) SKOSRec N

Movie 32 96 50

Music 62 98 53

Book 53 100 51

SPARQL queries but the within-subjects test did notconfirm this assumption statistically (see Table 20)

However the most important finding of the anal-ysis was that SKOSRec cross-domain queries gener-ated significantly more suggestions throughout the do-

mains (movie t(49) = minus1792 p lt 0001 mu-sic t(52) = minus2573 p lt 0001 book t(50) =minus1275 p lt 0001) (see Table 20) Fig 10 illustratesthis graphically

Whenever the SPARQL request provided recom-mendations which was the case in less than half ofthe user sessions (see Table 19) the SKOSRec sug-gestions were assessed to be of almost equal qualityas the SPARQL recommendations (see Table 20 Use-fulness) Hence when SPARQL querying did not pro-duce any results in more than the other half of the testcases the SKOSRec approach may still have provideduseful suggestions

Therefore it is reasonable to assume that the capa-bility of the SKOSRec engine to flexibly switch be-tween processing steps of similarity calculation andpattern matching can facilitate an improved explo-ration of LOD repositories Table 21 also shows thatthe items in the two sets often did not match as isindicated by low Jaccard indices This finding furtherdemonstrates that the execution of an additional step

26 L Wenige et al Similarity-based Graph Queries

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

00

25

50

75

100

Movie Music Book

Domain

No

of i

tem

s

Method

Regular

CrossminusDomain

Recommendation List Size

Fig 10 Mean recommendation list sizes for SPARQL and SKOSRec cross-domain requests (TC4)

L Wenige et al Similarity-based Graph Queries 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

of similarity calculation can help to retrieve other rel-evant items

5 Future Work amp Conclusion

This paper has presented the query-based SKOS-Rec engine that facilitates new types of recommen-dation requests for LOD repositories The evaluationshave shown that the SKOSRec system can often gener-ate relevant and useful suggestions when certain querytemplates are utilized In many cases the applicationof expressive or advanced query patterns (ie in thecontext of constraint-based retrieval or cross-domainrequests) helped to significantly improve recall valueswhile still providing the same level of quality in theremaining performance dimensions Additionally ad-vanced recommendation requests can be used to gener-ate diversified recommendation lists in case users arenot satisfied with the results of regular recommenda-tion requests (ie roll-up or cross-domain retrieval vsregular requests) While not being superior in each do-main and test case is at least safe to say that the novelrecommendation approaches of the SKOSRec engineconsiderably extend common retrieval methods and atleast provide an alternative search strategy when con-ventional methods fail to provide useful results Addi-tionally the increase in diversity and recall is in linewith findings from previous research on LOD-enabledRS [45]It is a strength of the developed approaches that theyfacilitate combinations of graph-based and similarity-based retrieval at different stages of the recommenda-tion workflow This feature extends general search ca-pabilities for semantic networks It requires future re-search to investigate whether in addition to SKOS fur-ther RDF vocabularies can be used for ad-hoc item-to-item similarity computation to increase the range ofpotential usage scenarios since not every LOD datasetcontains SKOS annotations [53] The application ofadditional properties could also boost the recommen-dation quality of some query patterns (eg such as thequery types used in TC3) In this context it also needsto be determined whether the representation of the user

profile in the SKOSRec query language can be substi-tuted with free-text expressions to enable search-likefunctionalities LOD researchers have already devel-oped engines that can process natural language queriesover RDF data [59] [34] [67] It will have to be inves-tigated how the SKOSRec engine profitably fits intothis landscape of existing retrieval toolsAnother open research question concerns the aspect ofinterfaces to other applications In its standard config-uration the system is not an end-user retrieval enginebut a backend application with which an administra-tor who has domain knowledge (eg of RDF vocab-ularies and data models of a particular LOD reposi-tory and usage scenario) creates the respective querypatterns Although the websites of the online experi-ments represent possible implementations of suitableend-user applications for the SKOSRec engine thereare still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selec-tion of LOD items for the profile was made possibleby previously storing the data in an index that couldthen be accessed and searched before issuing a recom-mendation query However this runs against the ideaof ad-hoc retrieval In this context future research willhave to determine whether it is feasible to generate anon-the-fly mapping from items to the respective LODresources In the case of a commercial application theengine needs to match items with the correspondingentries in the product catalog Even if it is not yet clearwhich UI components can make the best possible useof the SKOSRec enginersquos features the results from theuser experiments can give at least a few suggestionsThe following retrieval options should be the defaultsetting in an interface that contains a LOD-enabledRS component to assist users in finding interestingitems Since the regular approach of on-the-fly re-trieval achieved good results throughout the domainsthe display of simple recommendations generated frompreviously stated user preferences represents a suitablestarting point for retrieval On-the-fly queries could berefined by providing filter options Since the evalua-tions have shown that expressive constraints often leadto increased recall values the application of such afilter might be the best option for assisted retrievalAs in the preparations for the experiments appro-priate graph-based query patterns for the usage sce-nario in question would also have to be determinedby a domain expert before setting up the applicationThe same applies to advanced retrieval patterns whichshould be available to the end user to refine the querywhen both the simple as well as the constraint-based

28 L Wenige et al Similarity-based Graph Queries

approach have failed to provide relevant items Thissuggestion is made because roll-up queries were com-petitive with regular recommendations in the web-based experiments Therefore they represent a viablealternative retrieval strategy For some domains (egfor multimedia retrieval) it is also possible to usean optional selection field to enable the formulationof cross-domain requests Evaluations for this querytype suggest that users might receive interesting results(ie diversified and comprehensive recommendationlists) from such a requestIn summary the contributions of the paper are as fol-lows

ndash Development and implementation of novel re-trieval strategies in the SKOSRec system whichfacilitate personalized requests and leverage thepotential of knowledge graphs in order to addressthe research gap in state-of-the-art recommenda-tion methods with regard to customized queries

ndash Realization of web-based user experiments infour usage scenarios with the following out-comes

bull Successful demonstration that knowledge graphdata is applicable in the context of live recom-mendation scenarios

bull Creation of suitable query patterns that canbe applied to represent individual informationneeds in different application contexts The setof query patterns can be extended with the helpof the SKOSRec query language to meet therequirements of additional usage scenarios

bull Proof that the novel retrieval strategies can besuccessfully applied across various applicationcontexts and recommendation tasks and thatthey outperform conventional content-basedretrieval approaches especially in the perfor-mance dimensions of diversity and recall

The prototypical implementation and the evaluationof the SKOSRec engine in different usage scenarioshave demonstrated that the freely available knowledgesources on the LOD cloud can be successfully appliedto advance state-of-the-art methods in IR and recom-mender systems The proposed personalized and cus-tomized retrieval strategies can leverage the potentialof knowledge graphs by better matching informationneeds of users with the help of semantic search op-tions

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be com-prised of up to six elements of which the partsSimProjection and ItemPart are obligatory Henceusers need to state at least their preferencesand how many recommendations they wouldlike to receive Advanced retrieval patterns canbe formulated as well For instance similar re-source retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (Se-lectPart) of LOD resources As in SPARQL thequery can start with a prologue that abbreviatesIRIs with namespace declarations [21]

SelectPart The SelectPart part enables subqueryingwith recommendation results The surroundingquery is a SELECT query with a slightly simpli-fied WHERE clause (ie RecWhereClause) withwhich postfilter conditions can be formulatedIn this section users have the option to formu-late aggregation-based retrieval requests (Aggre-gation)

Aggregation This query part handles aggregation-based postfiltering Users specify the IRI (IRIref )resource for which similarity scores of sublevelentities are summarized Additionally it is statedhow the data should be aggregated (ie SUMMAX or AVG) as well as which variable (Var)represents the aggregation-based recommenda-tions in the postfilter section The compiler makessure that this variable is actually contained in theRecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refersto the list of generated suggestions It definesthe recommendation variable to which all simi-lar LOD resources are mapped In case pre- andpostfilter conditions are set in the other sectionsthe compiler checks whether the variable (Var)appears in these sections as well Otherwise therequired join operations cannot be executed Itis specified how many recommendations shouldbe displayed (Integer) and whether the requestwill be issued against a default repository or as across-repository request (ServiceIntegration)

ServiceIntegration This section indicates that a userintends to receive recommendations from a dif-ferent SPARQL endpoint than the default end-point that is stated in the standard configurationof the SKOSRec engine It specifies the targetSPARQL endpoint (IRIref ) from which a user

L Wenige et al Similarity-based Graph Queries 29

wants to receive recommendations Upon extrac-tion of SKOS annotations from the default sourceendpoint a subsequent request is sent to the spec-ified target endpoint

ItemPart This section of a SKOSRec query repre-sents a single preference in the user profile How-ever recommendation queries can contain a cou-ple of preference statements A preference is ei-ther expressed as a LOD resource (IRIref ) or asa variable which is bound to a preference query(VarPart) In the latter case the resources areobtained by matching the graph pattern in theVarPart with the triple statements in the reposi-tory The ItemPart statement can be additionallysupplemented with a concept-to-concept similar-ity threshold (Sim) that triggers concept expan-sion with the approach of flexible similarity de-tection which is extensively described in one ofour previous works [64]

VarPart The construct initiates preference queryingIndividual likings are expressed as a graph pat-tern in a RecWhereClause The variable (Var) isa placeholder for the LOD resources that will beused to set up the user profile The compiler hasto make sure that the specified variable occurs inthe RecWhereClause as well Otherwise the ex-traction of LOD resources cannot be carried outcorrectly

Sim The Sim part of a SKOSRec query denotes thethreshold of concept-to-concept scores for flexi-ble similarity detection Even though knowledge-based similarity values usually range between 0and 1 the specification allows a broader range ofdigits in the Decimal datatype to enable applica-tion of other similarity metrics in case they areneeded

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users canstate whether scores should be ldquolargerrdquo ldquolargerthanrdquo or ldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three dif-ferent sections of a SKOSRec query ie either inthe prefilter (ItemPart) the postfilter (SelectPart)or the preference filter section (VarPart) TheRecWhereClause closely resembles the ldquoWhere-Clauserdquo of the SPARQL 11 specification withall its subsequent parts [21] Hence it enablesdifferent combinations of graph pattern match-ing However the RecWhereClause of the SKOS-Rec language allows fewer pattern matching ex-pressions than the ldquoWhereClauserdquo of the regular

SPARQL syntax specification [21] For instanceit is neither permitted to formulate subqueries norto direct filter requests to more than one reposi-tory (see RecGraphPatternNotTriples for furtherexplanations) This modification has been madeto simplify similarity calculation After parsingthis section the compiler makes sure that recom-mendation preference or postfilter variables oc-cur at least once in the RecWhereClause to guar-antee that subsequent steps of the recommenda-tion workflow can be applied on existing LOD re-sources In case the RecWhereClause comprises aRecMinusGraphPattern it also has to be checkedthat the variable is not only contained in theMINUS part of the group

RecGroupGraphPattern A RecGroupGraphPatterncan contain one to many basic graph patternswhich can be differently combined according toRecGroupGraphPatternSub

RecGroupGraphPatternSub This section defines thegraph patterns that are applied to identify suit-able LOD resources User filters can be either ex-pressed as basic graph patterns (TriplesBlock) oras combinations of them (RecGraphPatternNot-Triples) The sequence of different kinds of pat-terns can be flexibly specified

RecGraphPatternNotTriples This query part closelyresembles the similar named ldquoGraphPatternNot-Triplesrdquo of the SPARQL 11 specification [21]The only difference to SPARQL is that the Rec-GraphPatternNotTriples section is a little morerestricted in terms of admissible graph patternsthan the ldquoGraphPatternNotTriplesrdquo section of theregular SPARQL specification For instance itdoes not allow to retrieve LOD resources otherthan from the default graph that is specified in theconfiguration of the SKOSRec engine since filterpatterns have to be applied to datasets that containSKOS annotations Hence they need to be speci-fied before runtime execution to ensure that sim-ilarity calculation can be carried out correctly Ifa user was able to formulate filter conditions thatreferred to different RDF graphs and endpointsaccordingly this would not be possible

RecGroupOrUnionGraphPattern This query sec-tion marks an alternative graph pattern in the styleof the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntaxspecification [21] However it neither allowsSPARQL-like subqueries nor federated queries

30 L Wenige et al Similarity-based Graph Queries

for filtered resources because this would compli-cate similarity calculation

RecOptionalGraphPattern In this part of a SKOS-RecQuery users are enabled to specify additionalgraph patterns which might extend the query so-lution but do not necessarily have to match thedata

RecMinusGraphPattern This section handles exclu-sion of LOD resources for cases when users wantto omit certain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

L Wenige et al Similarity-based Graph Queries 31

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM A

mobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige J Ruhland Flexible on-the-fly recommendationsfrom Linked Open Data repositories In International Confer-ence on Business Information Systems pp 43ndash54 2016

[63] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[64] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[65] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[66] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[67] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Future Work amp Conclusion
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 7: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge … · Semantic Web 0 (0) 1 1 IOS Press Similarity-based Knowledge Graph Queries for Recommendation Retrieval Lisa Wenige*,

L Wenige et al Similarity-based Graph Queries 7

The IC of a set of SKOS concepts is measured bythe sum of the inverse logarithms of each conceptrsquos fre-quency in relation to the maximum frequency of oc-currences among relevant resources (Definition 7)

Definition 7 (SKOS Information Content) The SKOSIC is defined as an aggregation of the individual ICvalues of each concept that is contained in the setof shared features (c isin Cshared) where f req(c) isthe frequency of occurrences of c among all rele-vant resources and n is the maximum frequency of oc-currences among these resources The final similarityscore of two resources r and q is determined by sum-mating the IC values of each concept of the shared fea-ture set

IC(Cshared) = minussum

cisinCshared

log(

f req(c)

n

)(7)

After having obtained resource annotations as wellas IC values similarity scores between the input re-source r and each potentially relevant resource q canbe calculated When a user profile contains more thana single item similarity values are aggregated throughsummation to determine the final recommendationscore (Definition 8)

Definition 8 (Recommendation score) The recom-mendation score of a potentially relevant resource q isquantified by the sum of similarities with each resourcer that can be found in the profile (Pr)

score(Pr q) =sumrisinPr

sim(r q) (8)

Upon calculation of similarity values for each po-tentially relevant item q the engine ranks the retrievedresources based on a given limit (k) that is specified bythe user in the SKOSRec queryWith the presented methods of on-the-fly retrieval rec-ommendations can be generated from LOD reposito-ries through an API interface without requiring lo-cal metadata or excessive preprocessing operationsother than the mapping of profile items with LOD re-sources4

4Di Noia et al describe how a mapping procedure can beconducted for a real-world implementation For their LDRS theymatched movie titles with the corresponding LOD resources in DB-pedia The authors extracted IRIs labels and publication dates forall movie resources They applied the Levenshtein distance on theseresources to identify matching movies items [13]

33 Preference Querying

In the previous subsection it was assumed that usersexpress their tastes in the form of preference state-ments for a certain item in the form of an IRI refer-ring to a LOD resource However it may also be thecase that likings are only vaguely known for instancewhen users prefer products with specific features butare not able or willing to specify the actual itemsThe engine obtains such a profile by issuing a sub-sequent SPARQL query that contains the preferencegraph pattern (Ppre f ) in the WHERE condition Sup-pose a user is interested in Quentin Tarantino movies(dbrQuentin_Tarantino) but does not specify theprecise items she likes He could send a preferencequery to the SKOSRec engine The graph pattern inthe request would match all movies that are con-nected to dbrQuentin_Tarantino with the prop-erties dbodirector or dboproducer (see Listing7)

Listing 4 SKOSRec query to generate recommenda-tions based on a preference query in the movie domain(Q2)

PREFIX dbr lthttpdbpediaorgresourcegt PREFIX dbo lthttpdbpediaorgontologygt

RECOMMEND movie TOP 5PREF [ prefMovieWHERE prefMovie dbodirector dbrQuentin_Tarantino UNIONprefMovie dboproducer dbrQuentin_Tarantino ]

Table 35 shows the corresponding solution set tothe preference query containing films from DBpediathat are similar to Quentin Tarantino movies

Table 3Result set for Q2

moviedbrThe_Godfather_Part_IIdbrThe_Prestige_(film)dbrThe_Dark_KnightdbrPlanet_TerrordbrClerks

The notion of a queried profile is specified in Defi-nition 9

Definition 9 (Queried profile) A queried profile is auser profile that has been compiled by retrieving allLOD resources r that are part of an annotation graph

8 L Wenige et al Similarity-based Graph Queries

AD and are contained in the solution mappings result-ing from the evaluation of a specified graph pattern(Ppre f ) over a dataset D (Eq 9)

Pr = micro(r) | micro isin [[Ppre f ]]D r isin dom(micro) r isin AD

(9)

34 Prefiltering

To facilitate exploratory search in LOD reposito-ries users should also have the option to formulateconditions in their queries A possible way to enablefilters during retrieval is before similarity calculationThus recommendation lists can be adjusted to individ-ual needsDue to the expressiveness of RDF data filter condi-tions can not only be applied on attributes that are di-rectly connected to potentially relevant items but canalso be declared as a graph pattern By this means it ispossible to retrieve LOD resources that are both simi-lar to the user profile as well as fulfill an advanced con-ditionThe strengths of the approach will be illustrated with atravel recommendation request that was executed overDBpedia Suppose a customer expressed a preferencefor the Lake Baikal region (eg as indicated by a pos-itive rating on a travel community site) With simpleon-the-fly retrieval she would receive a recommen-dation list that primarily consists of travel destinationsthat are located in the proximity to this geographicalarea A graph-based filter on the other hand facili-tates advanced retrieval options In the given exampleit enables the user to obtain suggestions for SoutheastAsian travel destinations that have similar features asthe Lake Baikal region even though the resource dbrLake_Baikal is not directly connected to any desti-nation with the annotation dbcSoutheast_Asia butcan only be identified by exploring the surroundinggraph structure of the filter attributeThe advanced constraint ensures that the generatedresult set is not empty A simple attribute-level fil-ter which is commonly utilized in faceted search sys-tems would not generate any search hits By thesemeans graph-based filter patterns can help users tobetter explore knowledge graphs thereby overcomingdata quality issues in LOD repositories The corre-sponding SKOSRec query to this example is shown inListing 5

Listing 5 SKOSRec query with an expressive pre-filter condition to retrieve Southeast Asian destinationsbased on a preference for the Lake Baikal region (Q3)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbc lthttpdbpediaorgresourceCategorygtPREFIX dbo lthttpdbpediaorgontologygtPREFIX dct lthttppurlorgdctermsgtPREFIX skos lthttpwwww3org200402skoscoregt

RECOMMEND destination TOP 5PREF dbrLake_Baikal[ WHERE destination dbocountry country

country dctsubject countrySubjectcountrySubject skosbroader dbcSoutheast_Asiadestination rdftype dboPlace

]

Table 34 shows the solution to this query It con-tains travel destinations and regions of interest thatare both similar to the previously liked destinationdbrLake_Baikal and are located in Southeast AsiaEcoregions around lakes rivers or mountains fromSoutheast Asian countries such as Cambodia Vietnamor Malaysia are presented to the user

Table 4Result set of Q3

destinationdbrTonle_SapdbrMekongdbrMount_KinabaludbrLaguna_de_BaydbrLake_Toba

The technical details for this type of query are givenin the following definitions Before the extraction ofrelevant resources and annotations the recommenda-tion engine identifies the resources that satisfy the fil-ter (Definition 10)

Definition 10 (Prefiltered resources) The mapping offiltered resources Ωpre f ilter is obtained by matching aspecified graph pattern (P) to the RDF graph of adataset D (Eq 10)

Ωpre f ilter = micro|q | micro isin [[P]]D (10)

After preselection resources are brought togetherwith user profile data to retrieve the final set of recom-mendations (Definition 11)

Definition 11 (Annotations and relevant resources(prefiltered retrieval)) The solution mappings of an-notations and relevant resources in prefiltered retrievalmode is obtained by joining the set of prefiltered re-

L Wenige et al Similarity-based Graph Queries 9

sources with the mappings Ωr of all relevant resourcesand annotations as determined from the user profile

Ωpre = Ωpre f ilter Ωr (11)

After retrieving annotations and relevant resourcesin prefiltering mode the system generates constraint-based recommendations In this context the rankingprocedure is adapted to the restricted set of relevantresources and annotations Hence IC values of match-ing annotations and respective SKOS similarities arequantified according to the user filter By this meansrecommendation scores reflect the similarity of itemsfor the actual set of filtered LOD resources Except forthis restriction the system carries out the calculations(ie determination of similarity values the ranking ofLOD resources) in the same way as in non-filteringmode

35 Postfiltering amp Combinations

The SKOSRec engine cannot only combine simi-larity calculation and graph pattern matching to pre-filter relevant LOD resources it can also apply userconstraints after similar items have already been de-termined By this means recommendations are filteredex-post to the similarity detection process This fea-ture enables novel retrieval patterns in which sugges-tions are part of a subquery As an example scenariofor a postfilter recommendation task consider the fol-lowing SKOSRec query which obtains suggestions formovies and directors based on a userrsquos preference forthe director dbrQuentin_Tarantino (see Listing6) For the given request the engine would identifysimilar directors based on Quentin Tarantinorsquos char-acteristic features (eg geographic origin or awardswon) and would determine the movies these similar di-rectors have shot Thus users may receive interestingmovie recommendations

Table 5 shows the solution the SKOSRec system re-turns when Q4 is issued against DBpedia The ellipsesin the displayed tables indicate records that have beenexcluded for brevity reasons

However this approach can still be improved uponFor instance the engine only considers the metadatadescriptions of the director whereas related movieshave to be retrieved anew upon execution of the post-filter section Thus they are not ranked according tosimilarity in the final output tableAnother weakness is the biased semantics of the query

Listing 6 SKOSRec query to generate simple post-filtered recommendations in the movie domain (Q4)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX yago lthttpdbpediaorgclassyagogt

SELECT director movieWHERE movie dbodirector director director rdftype yagoDirector110014939 LIMIT 50RECOMMEND director TOP 5PREF dbrQuentin_Tarantino

Table 5Result set of Q4

director moviedbrKirk_Douglas dbrScalawag_(film)dbrKirk_Douglas dbrPosse_(1975_film)dbrKevin_Costner dbrThe_Postman_(film)dbrKevin_Costner dbrDances_with_Wolves dbrGene_Kelly dbrHello_Dolly_(film)dbrGene_Kelly dbrInvitation_to_the_Dance_(film) dbrAlan_Alda dbrGoodbye_Farewell_and_AmendbrAlan_Alda dbrMargaretrsquo_s_Engagement dbrSteve_Buscemi dbrInterview_(2007_film)dbrSteve_Buscemi dbrAnimal_Factory

Usually when a person favors a movie director thispreference refers to the movies this director has shotand not to the personal features of the person Two di-rectors might be similar according to certain featuressuch as birth year nationality or received awards butthe style and genre of the movies they have created canstill be different A third factor is that recommenda-tion retrieval for movie directors should also take intoaccount that a film artist is likely more relevant themore movies she has shot that are in line with the pre-ferred movies of the user Hence the SKOSRec engineenables another postfiltering strategy which is calledaggregation-based retrieval This strategy can be help-ful whenever an entity type is linked to a couple ofLOD resources in a dataset such that similarity scoresof related entities (eg movies) can be aggregated toan overall recommendation score for a certain item(eg for a movie director)Aggregation-based requests can be well combinedwith other retrieval patterns such as preference queries

10 L Wenige et al Similarity-based Graph Queries

In the case of the movie recommendation scenario auser could state that she enjoyed Quentin Tarantinomovies in the past and would like to receive sugges-tions for directors that shot similar film Usually thiskind of question can typically only be answered byanother human being eg in a personal interactionamong friends or in an online forum The fact that thelanguage facilitates these kinds of queries is one of itskey advantages

Listing 7 SKOSRec query to generate post-filteredrecommendations based on a preference query in themovie domain (Q5)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX yago lthttpdbpediaorgclassyagogt

SELECT director movieWHERE movie dbodirector director director rdftype yagoDirector110014939 LIMIT 50AGG dbrQuentin_Tarantino director SUMRECOMMEND movie TOP 100PREF [ prefMovieWHERE prefMovie dbodirector dbrQuentin_Tarantino ]

Table 6 depicts the results of Q5 as retrieved fromDBpedia When comparing the solutions sets for Q4and Q5 it becomes clear that they are totally dif-ferent While Q5 generated a list of directors whohave mostly created action and thriller movies with atwist (ie Tarantinorsquos unique filmmaking style5) Q4provided a completely different output It shows USAmerican actor-directors whose movies range frommusical films (dbrHello_Dolly_(film)) towestern movies (dbrDances_with_Wolves) ordrama films (dbrInterview_(2007_film))Thus the directors in the solution table of Q4 (Table5) seem rather arbitrary and not exceptionally helpfulconcerning the initial user request

We define this kind of aggregation-based request asa so-called rollup query This definition was chosen be-cause the final suggestions are based on summarized(ie rolled up) preferences for sublevel entities An-other example for a roll-up request stems from the ap-

5httpsenwikipediaorgwikiQuentin_Tarantino

Table 6Result set of Q5

director moviedbrSteven_Spielberg dbrIndiana_JonesdbrSteven_Spielberg dbrJurrasic_Park_(film) dbrFrancis_Ford_Coppola dbrThe_Godfather_Part_IIdbrFrancis_Ford_Coppola dbrThe_Godfather_Part_III dbrChristopher_Nolan dbrThe_Prestige_(film)dbrChristopher_Nolan dbrThe_Dark_Knight dbrRobert_Rodriguez dbrPlanet_TerrordbrRobert_Rodriguez dbrMachete_(film) dbrKevin_Smith dbrClerksdbrKevin_Smith dbrClerks_II

plication scenario of travel search This recommenda-tion query derives suggestions for city trip destinationsfrom the points of interest (POI) a user has visited andliked in another city (eg London) Listing 8 showsthe corresponding SKOSRec query for this scenario

Listing 8 SKOSRec query to generate post-filteredrecommendations (aggregation-based) for a preferenceprofile containing the city of London (Q6)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX dullthttpwwwontologydesignpatternsorggtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX rdfslthttpwwww3org200001rdf-schemagt

SELECT cityWHERE poi property city city rdftype dboCity property rdfssubPropertyOf dulhasLocation LIMIT 3AGG dbrLondon city SUMRECOMMEND poi TOP 100PREF dbrOxford_StreetdbrNotting_HilldbrSouth_Bank

The SKOSRec request contains subquery com-mands that refer to the userrsquos preferred POIs It spec-ifies how similarity scores of the generated sugges-tions should be aggregated (ie sum-based) The re-trieval is based on the 100 POIs that are most simi-lar to the ones in the preference section Additionallythe subquery part refers to the DBpedia resource thatrepresents London (AGG dbrLondon city) thusindicating that the list of recommendations should notcontain any POIs in this city Upon processing the de-

L Wenige et al Similarity-based Graph Queries 11

picted query over the DBpedia repository the enginedisplays the three cities for whom the highest aggre-gated scores have been detected (see Listing 7)

Table 7Result set of Q6

citydbrOxforddbrAbingdon-on-ThamesdbrWallingford_Oxford

The motivation for formulating such a query is thatwhen searching for cities preferences might be betterrepresented through the sights a user has visited else-where rather than through a simple on-the-fly requestsolely containing a preference statement for a city inthe user profile sectionBesides the above presented roll-up retrieval patternsthe SKOSRec grammar may facilitate the formula-tion of additional types of advanced recommendationqueries For instance cross-domain queries are an in-teresting additional pattern to be explored for personal-ized retrieval Listing 9 shows an example request thatgenerates movie suggestions based on the preferencefor a music band (ie dbrThe_Jackson_5)

Listing 9 SKOSRec query to generate cross-domainrecommendations (Q7)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgt

SELECT movie participated musicalActWHERE movie rdftype dboFilm VALUES dbobasedOn dbomusicComposerdbogenre dbostarring musicalAct participated movie UNION movie participated musicalAct LIMIT 10AGG dbrThe_Jackson_5 movie MAXRECOMMEND musicalAct TOP 100PREF dbrThe_Jackson_5WHERE VALUES actType dboMusicalArtist dboBand musicalAct rdftype actType

The query contains a prefilter condition that trig-gers retrieval of music artists and bands Afterwardsthe scores of the top 100 suggestions are summarizedwith maximum-based aggregation Thus the highestsimilarity value among the set of music acts deter-

Table 8Result set of Q7

movie musicalActdbrMoonwalker dbrMichael_Jackson dbrThe_Wiz_(film) dbrAshford_amp_SimpsondbrGarfield_Gets_a_Life dbrThe_TemptationsdbrPipe_Dreams_(1976_films) dbrGladys_Knight_amp_the_PipsdbrThe_Jacksons dbrJermanine_Jackson

mines the ranking score of the film Also note be-cause of the aggregation-based retrieval procedureconnections between movies and the LOD resourcedbrThe_Jackson_5 are excluded from similaritycalculation The scenario exemplifies how semanticrelationships can help to generate suggestions for a tar-get domain (ie movies) that are based on a profilecontaining items from a different source domain (iemusic acts) While regular SPARQL queries performexact matching of triple statements the SKOSRec syn-tax facilitates the integration of similarity- and graph-based retrieval to enrich result lists with other poten-tially relevant itemsAs a point of reference consider the SPARQL queryfor the given cross-domain scenario (Listing 10)

Listing 10 SPARQL query to generate cross-domainrecommendations (Q8)

PREFIX dbr lthttpdbpediaorgresourcegt PREFIX dbo lthttpdbpediaorgontologygt PREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgt

SELECT movie musicalActWHERE movie rdftype dboFilm VALUES dbobasedOn dbomusicComposerdbogenre dbostarring dbrThe_Jackson_5 participated movie UNION movie participated dbrThe_Jackson_5 LIMIT 5

The query omits the similarity calculation partThus it only retrieves movies linked to the resourcedbrThe_Jackson_5 By this means the SPARQLresults have a better chance of being related to the ini-tial user profile On the other hand empty result listscan occur when the preferred LOD resource does nothave any direct connections to recommendable itemsThis assumption is backed up by the result lists of thetwo queries (Tables 8 and 9) The SKOSRec query pro-duces a more comprehensive solution than a regularSPARQL query

12 L Wenige et al Similarity-based Graph Queries

Table 9Result set of Q8

movie musicalActdbrThe_Jacksons dbrThe_Jackson_5

The following of definitions will formalize the post-filtered and aggregation-based retrieval forms Defini-tion 12 specifies the notion of SKOSRec recommenda-tions that may potentially undergo postprocessing

Definition 12 (SKOSRec recommendations) SKOS-Rec recommendations (R(Pr)) are the set of sugges-tions generated by processing the solution mappings(Ω) obtained from recommendation retrieval Theycan be a result of simple on-the-fly retrieval prefilter-ing or a result of a combination of these techniques Prrepresents the input profile that contains at least onepreference for an item The cardinality of R(Pr) is theintended number of recommendations (k) that is spec-ified by the user based on which the engine selects theresources with the highest recommendation scores thatare not contained in the profile (Eq 12)

R(Pr) = x | x isin q | score(Pr q) gt kmaxqisinΩ

score(Pr q) q isin Pr

(12)

The approach of subquerying with recommendationresults builds on the idea that SKOSRec suggestionsR(Pr) can be joined with any graph pattern Ppost pastto the process of similarity calculation (Definition 13)It does not make a difference how the engine obtainedthese recommendations

Definition 13 (Postfiltered recommendation mapping)The postfiltered recommendation mapping is obtainedby joining solution mappings generated from SKOS-Rec recommendations with the results of matching apostfilter graph pattern (Ωpost) (Eqs 13 and 14)

Ωpost f ilter = micro | micro isin [[Ppost]]D x isin dom(micro)

(13)

Ωpost = micro|x | x isin R(Pr) Ωpost f ilter (14)

As the prefilter Ppost can be comprised of triplestatements that are eligible in the RecGroupGraph-

Pattern part of the SKOSRec grammar Aggregation-based queries have to be treated differently than a reg-ular postfilter request Apart from excluding sublevelitems of the profile from the set of similar resourcesthe engine should also omit any connections betweenthe preferred superordinate item (a) and similar sub-level entities Additionally the user has to specify thevariable in his profile (eg y) that represents the itemswithin the postfilter graph pattern Ppost By this meansaggregation-based recommendations can be generated(Definition 14)

Definition 14 (Resources for aggregation-based re-trieval) The set of resources for aggregation-based re-trieval is obtained by retrieving LOD resources thathave a connection to a previously generated sugges-tion thereby excluding all resources that are linked tothe specified superordinate item a in the profile (Eq15)

Ragg(a) = micro(y) | micro isin Ωpost y isin dom(micro) a isin micro(y) (15)

For each potential aggregation-based suggestion (y)similarity scores of connected entities (x) have to beconsidered for the final ranking In this context differ-ent approaches to aggregation can be applied In caseswhen both similar items and aggregation-based recom-mendations are entities on comparable levels of gran-ularity then choosing the maximum similarity scoremight be the best way to represent the relatedness ofthe resource to the user profile On the other handwhen postfiltered recommendations are based on sim-ilarity scores of their sublevel entities (eg as in themovie recommendation example) the scores of relatedsuggestions should be aggregated by the sum or theaverage of individual scores

Definition 15 (Aggregation-based ranking score) Theranking score of an LOD resource y that is connectedto a set of recommended items is determined by aggre-gating scores of connected entities either by the max-imum (Eq 16) the sum (Eq 17) or the average (Eq18) of individual scores

scoreaggMAX(y) = maxxisinmicro(xy)

score(Pr x) (16)

scoreaggS UM(y) =sum

xisinmicro(xy)

score(Pr x) (17)

L Wenige et al Similarity-based Graph Queries 13

scoreaggAVG(y) =

sumxisinmicro(xy) score(Pr x)

|x isin micro(x y)|(18)

4 Evaluation of the SKOS Recommender

41 Experimental Setup

The novel recommendation strategies of the SKOS-Rec engine were evaluated in the context of web-basedexperiments for the usage scenarios of travel destina-tion search and multimedia RS The online approachhelped to reach out to more participants with diversedemographic backgrounds which would not have beenpossible in a laboratory experiment Another positiveside-effect of the web-based setting was that partici-pants could test and rate recommendations without be-ing directly observed by an experimenter which mighthave caused biased results otherwise Hence the on-line approach softened common limitations of labora-tory studies [19]The experiments were carried out in the defined usagescenarios with metadata descriptions from DBpediaThe database containing the local mirror of DBpediaran on a virtual server Besides the triple store a webapplication was hosted on the same machine The web-site ran on an Apache Tomcat 7 application server andwas implemented with the help of Java servlet technol-ogy67

We conducted the online experiments between April2016 and January 2017 Upon setting up the web ap-plication a pretest was carried out to control for po-tential pitfalls such as misleading navigation Two testusers ran step by step through the web interface andprovided their feedback Because of their insights weslightly modified the interface (eg through rephras-ing user instructions) to increase the chances of suc-cessful completion In case the SKOSRec engine is ap-plied in a live setting the design of the user interfacewill have to be tested with more users Our experimen-tal series focused on algorithmic performance whichwas tested with numerous participantsSubjects were recruited through the clickworkercomplatform8 Participants received a compensation fee

6httptomcatapacheorg7httpwwworaclecomtechnetworkjavajavaeeservletindexhtml8httpswwwclickworkerde

of 100e or 150e for a completed survey 103 par-ticipants took part in the study on travel destinationsearch In the multimedia domains in total 154 sub-jects evaluated the SKOSRec engine For each partof the evaluation it was ensured that participants hadnot just run through the evaluation screen Wheneverthere appeared to be irregularities in the log files (egthe default settings of the screen were not changed)these assessments were excluded from further analy-sis The survey template started with a consent formIt informed participants about the context and the pro-cedure of the study After having filled in the consentform participants were navigated to the first part of theexperiment It collected data on subjectsrsquo demograph-ics and their consumption and search habits After thepreparation phase participants worked on the first testcase (TC1) in each usage scenario In this test case theperformance of the SKOSRec engine was assessed inthe simple execution mode of on-the-fly recommenda-tions (see Subsect 32) It was done to gather data onthe usefulness of suggestions resulting from a simplecontent-based approach (ie baseline method) withwhich more advanced recommendation strategies werecompared at a later stage In cases where users did notstate any items the application showed an error pageWhile setting up a profile required some effort on theside of the user it was a necessary step for the experi-ment Since we did not have access to the log files of alive application profile generation had to be simulatedThe process was facilitated by a specifically tailoredinterface (see Fig 2) where subjects could search foritems through an autocomplete field that applied AJAXtechnology9

It enabled users to type in keywords (eg the nameof a music act or the author of a book) for whichmatchings were instantaneously displayed withoutreloading the site Hence participants could quicklyfind and select their items of interest Whenever auser entered a query through the AJAX interfacethe system matched the query keywords against anApache SolrLucene search index10 The index wasbuilt with item metadata which had been extractedfrom DBpedia prior to setting up the website Theindex comprised the names and abstracts of itemsWhenever available information on thumbnail pic-tures were saved in the index to be ready for display inthe AJAX interface In case the Apache SolrLucene

9httpapijquerycomjqueryajax10httpluceneapacheorgsolr

14 L Wenige et al Similarity-based Graph Queries

Fig 2 User profile generation TC1 (music domain)

search engine found index keywords that matched theuser query metadata information for these items wereimmediately shown on the screen This informationwas displayed to help participants make an informeddecision on whether the AJAX result list containedsuitable items for profile generationAside from the AJAX feature the appeal of the in-terface was increased by a progress bar at the top ofthe webpage (see Fig 2) Progress bars are commonlyutilized tools for web surveys In conventional paper-based studies subjects usually see how far ahead theyare while in a web survey they cannot check theirprogress immediately This is especially true for stud-ies that apply a screen-by-screen navigation whichwas the chosen approach for the conducted web ex-periments Progress bars prevent users from tiring ofthe questionnaire and withdrawing from the study alto-gether by showing how close participants are to com-plete the experiment [14]Upon profile generation the SKOSRec engine calcu-lated on-the-fly recommendations for a simple query(similar to Q1 Sect 3) which were shown to usersin a separate screen Suggestions were displayed witha sufficient amount of information eg thumbnailslabels or a short description to help participants as-sess the quality of the recommendation In addition tothis information users could also access the respectiveWikipedia article by following the hyperlink on the la-bel Relevance sliders were used to assess the utility ofeach recommendation They are often applied in eval-uations of IR systems [51 54] Scores were handled aspoints on a 0 to 100 scale of the relevance slider (outerleft side lowest relevance outer right side highest rel-evance see Fig 3)

This approach facilitated the calculation of an accu-rate precision score and the application of tests withhigher statistical power than an ordinal 5-star scaleThe score called mean relevance (mrs) was measuredwith Equation 19

mrs =

ksumi=1

scorei

k(19)

In addition to the precision score the web applica-tion also determined the size of the recommendationlist produced by the engine as an indicator for recallHowever recommendation list size can only be uti-lized for this purpose when mrs scores reach a reason-able levelWhile accurate recommendations are useful becausethey build trust in a system [57] they only capture anarrow aspect of performance [36] For instance anonline retailer can tremendously profit from an RSthat provides both relevant and unobvious recommen-dations since it enables users to explore the prod-uct catalog more thoroughly On the other hand inthe case of an RS only suggesting popular productsit is likely that the user already knows them [3 23]A pure accuracy-based evaluation disregards these as-pects which is why other performance indicators suchas novelty were measured as well Thus the evalua-tion interface also contained a section where subjectshad to tick a radio button that indicated whether anitem was new In the backend of the web applicationnovelty was measured as the ratio of relevant items not

L Wenige et al Similarity-based Graph Queries 15

Fig 3 Recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

familiar to the user (|nov|) among the total numberof relevant items (|tp|) (Eq 20)

nv =|nov||tp|

(20)

An item was considered relevant when it achievedan mrs score of 50 or higher In addition to novelty anddiversity the web application also tracked the diversityof recommendation lists since a topically diversifiedresult set has been found to increase overall user satis-faction [31 66] In the RS literature the most widelyexplored diversification metrics are the ones that mea-

sure the similarity between each item pair in the rec-ommendation list The more the items are alike theless diverse is the result set [19] The study series ofthis paper applied the metric by Castells et al [8] Itdetermines item-to-item similarity values with the co-sine similarity measure Scores are then converted todistance values (Eq 21) and averaged for the recom-mendation list (Eq 22)

d(il im) = 1minus cos(il im) (21)

16 L Wenige et al Similarity-based Graph Queries

divC =2

k(k minus 1)

sumlltm

d(il im) (22)

The content-based diversity scores (divC) were cal-culated in the backend of the web application andsaved in a log file for each generated recommenda-tion list of the SKOSRec engine during runtime execu-tion In addition to the measurement of divC scores weasked participants about their opinions on recommen-dation list diversity directly Following the RS evalua-tion framework by Pu et al [44] two Likert items wereposed for this purpose

ndash The recommendations of list [] are diversendash The recommendations of list [] are similar to

each other

Besides the above listed algorithmic performancemetrics the authors gathered assessments from usersregarding the overall usefulness of the suggestionsThis approach is also in line with the empirically veri-fied evaluation framework by Pu et al [44] which pro-poses to measure the psychometric construct of Per-ceived Usefulness to get a comprehensive understand-ing of user opinions on recommendation results Thusparticipants were instructed to state their agreement topositive statements regarding the relevance of the sug-gestions The statements were adapated to the speci-ficities of each usage scenario Fig 4 shows an exam-ple collection of Likert questions for the domain ofmusic RS

After participants had provided the required infor-mation their answers were saved in a log file and theywere navigated to test case 2 (TC2) which evaluatedthe SKOSRec enginersquos performance for constraint-based queries (similar to Q3 Sect 3) We tried to findout if the engine was able to improve user satisfactionthrough filter conditions Even though IR systems al-ready successfully apply facet filters on result sets theexperiments still had to prove whether LOD-enabledconstraints have the same effect on the Perceived Use-fulness of recommendations For the comparisons ofnon-filtered and filtered suggestions we applied awithin-subjects design in which participants had torate two recommendation lists resulting from regular(non-filtered) and filtered requests in TC2 We favoredthis setup over a between-subjects design where sub-jects would have been assigned to only one result listWhile the between-subjects design more closely re-sembles a real-world setting since it does not require

participants to interact with more than one approachat a time differences among subjects (eg regard-ing demographics or topic expertise) can potentiallyhave a considerable impact on quality judgments [19]Hence when applying a between-subjects design onedoes not know whether differences in performance arecaused by the approaches or by the variability amongparticipants Therefore researchers need to conductbetween-subjects experiments with a sufficient numberof study subjects to level out these differences Within-subjects studies on the other hand require fewer par-ticipants while still achieving the same level of sta-tistical power since between-subject variability is notan issue [30 31] Beel et al have shown that evensmall variations in an RS test setting could cause sub-stantial differences in performance outcomes [6] Tokeep the extent of variations even lower comparisonsbetween non-filter and constraint-based recommenda-tions were based on the same profile for each user inTC2 Therefore the user profile from TC1 served as astarting point for filtered retrieval The web applicationshowed the recently generated profiles to users andasked them to provide an additional filter conditionFig 5 depicts an example web form for a constraint-based recommendation request in the music domain Itcomprises the user profile and an additional text fieldwhere users stated their constraint

Afterwards the engine applied this filter conditionon the set of potential recommendation results be-fore similarity calculation started For each usage sce-nario different types of constraints were available Re-garding suitable filter options the author sought toachieve a balance between assisting users in finding fil-ter conditions that would adequately represent their in-formation needs while keeping the variability amongusers as low as possible Therefore the specificitiesof each scenario and the availability of the respectivedata sources were determined For instance for themultimedia domains (movie music and books) it wasassumed that users would like to filter recommenda-tions according to the specific genre of the item Fortu-nately the DBpedia dataset contains this kind of infor-mation for the music domain In this domain the genrecan be retrieved through the properties dbogenreHowever in the movie and book domains DBpediadoes not provide sufficient information Many LOD re-sources which represent books were not assigned agenre In the movie domain a genre property is miss-ing entirely and genre-specific information can of-ten only be found in the subject categories describ-ing the movie Hence in contrast to the experiment on

L Wenige et al Similarity-based Graph Queries 17

Fig 4 Evaluation screen for recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

music recommendations participants in the domainsof books and films did not filter their recommenda-tion lists by genre On the other hand the propertydcsubject was applied as recommendation filter ineach multimedia domain Since the subject propertyoften links to many informative features of multimediaitems (eg release period geographic information orcontent information) the author decided that this fea-ture represents a powerful filter dimensionIn the experiment on travel destination search the webinterface offered two filters a subject and a location-specific filter The subject filter enabled users to statetheir preferences regarding the characteristics of a lo-cation (eg being a nature reserve) The location-specific filter on the other hand gave participants theoption to specify where the desired destination shouldbe located11

The second fundamental research issue of TC2 con-cerned the question whether expressive constraints canboost recommendation quality even further Hence

11httpdbpediaorg

apart from executing a constraint-based workflow witha simple filter (ie a direct attribute of an item)the SKOSRec engine generated suggestions with thehelp of an expressive filter as well This second pro-cedure while still applying the same user constraintas the simple filter utilized an expressive graph pat-tern to include additional LOD resources in the resultset For instance in case a subject filter was selectedresult set expansion was facilitated by exploring thewider semantic space of the DBpedia category graphthrough a property path declaration in the graph pat-tern (eg skosbroader[2]) In the travel exper-iment an additional expressive filter retrieved place-related information (dulhasLocation) for location-specific subproperty relations that are not material-ized in DBpedia Hence the expressive graph pat-tern expanded the filter such that additional proper-ties were matched (eg dbocountry dbocity ordbostate) while still applying the specified userconstraint on the set of LOD resources Upon exe-cution of constraint-based recommendation requestsusers received two recommendation lists (resultingfrom simple and expressive filtering accordingly) in a

18 L Wenige et al Similarity-based Graph Queries

Fig 5 Web form for a contraint-based recommendation request (music domain)

separate evaluation screen Participants evaluated re-sult sets in the same manner as in TC1 For each resultset the screen contained an additional Likert item ask-ing subjects to give their agreement to the statementldquoThe filter has improved the recommendation resultsof listrdquo The sequence of suggestions resulting fromthe different methods was randomly assigned in eachuser session to avoid order effectsAfter study subjects had completed TC2 they werenavigated to test case 3 (TC3) In the TC3 section ofthe travel experiment participants could choose be-tween three options for rollup query patterns (simi-lar to Q6 Sect 3) Users were able to obtain recom-mendations for travel destinations based on POIs theyhad visited during the stay in another destination Theycould either select a city (option 1) a region (option 2)or a country (option 3) as destination type Upon en-tity type selection the SKOSRec engine generated ap-propriate suggestions Fig 6 shows the web form fromthe travel experiment that facilitated the formulation ofadvanced rollup requests in TC3

When users had selected an entity type stated theirfavorite travel destination and three belonging POIs

the application generated a SKOSRec query from theseparameters A simple on-the-fly request was sent tothe engine as well to retrieve baseline recommenda-tions with which the suggestions resulting from theadvanced request were compared at a later stage ofthe experiment The simple query only contained theuserrsquos favorite travel destination and the selected en-tity type option thus omitting information on the POIsDuring the travel experiment the engine processedboth the simple and the advanced query and producedtwo recommendation lists As in the previous parts ofthe experiment participants assessed the quality of theresult sets which appeared in random order on thescreenThe multimedia experiments also contained a sec-tion on rollup retrieval (TC3) As the advanced travelqueries the multimedia requests aggregated similarityscores of sublevel entities through summation whichthe engine joined with a postfilter In addition to thesefeatures the pattern contained a preference query partThis section identified a set of preferred items througha single user statement Participants either specifiedtheir favorite directoractor (music domain) music act

L Wenige et al Similarity-based Graph Queries 19

Fig 6 Travel - User profile generation TC3

(music domain) or author (book domain) and receivedrecommendations based on the features of the workscreated by the artist (similar to Q5 Sect 3) TheSKOSRec engine generated recommendations basedon the creative works of the favored artist Partici-pants also received suggestions from a simple on-the-fly query which computed results according to theartistrsquos characteristics As in the travel experiment rec-ommendations from the advanced rollup request werecompared with the on-the-fly query without letting par-ticipants know which approach they were assessingand through random assignment of result set positionsAfter participants of the multimedia experiments hadevaluated the results of TC3 they were guided to testcase 4 (TC4) which evaluated cross-domain queriesThe experiment on travel RS ended with TC3 becauseno suitable graph patterns could be identified to fa-cilitate these kinds of suggestions for the travel usagescenario In the multimedia domain however the datafrom DBpedia was sufficient for this retrieval task Theentity type of the study (ie movie music act or book)defined the source domain based on which the enginegenerated suggestions for items from another multime-dia target domain (similar to Q7 Sect 3) Fig 7 de-picts the cross-domain web form of the music experi-ment

The web form assisted subjects in formulating a re-quest which then obtained movie suggestions basedon the favorite music act of the user Parameters fromusers were entered in the placeholders of the querypattern in the backend of the application We formu-lated the query templates in such a way that the queryengine matched any properties or graph patterns thatcould potentially connect two items from the specified

target and source domains For instance such match-ings can occur when a user states his favorite moviethat is written by a particular author In case the sameauthor has also written some books they might be ofinterest to the user as well Another example stemsfrom the movie domain Suppose in hisher profile aconsumer has declared a preference for a movie thatlinks to the corresponding soundtrack It might wellbe the case that the user likes the soundtrack and themusic acts that were involved in creating it just asmuch as she likes the movie The practice to distin-guish homonymous LOD resources through the prop-erty dbowikiPageDisambiguates is another in-teresting linking pattern from DBpedia that was ex-ploited for cross-domain requests in TC4 since it con-nects similar entities (ie the book edition of a certainmovie)Cross-domain SPARQL queries (similar to Q8 Sect3) were posed as baseline requests in this test caseWhile the SPARQL query only matched cross-domainrelations for a single LOD resource the SKOSRec re-quest expanded the search space by considering linksfrom more than one item As in the previous test casesTC4 ended with an evaluation screen where users as-sessed the SPARQL-based as well as the SKOSRecsuggestions in random order without knowing whichrecommendation list belonged to which method Table10 gives an overview of the test cases and the meth-ods that were applied in the experiments in each usagescenario

42 Results

In total 257 people took part in the studyThe gath-ered samples from the experiments contained answers

20 L Wenige et al Similarity-based Graph Queries

Fig 7 Music - User profile generation TC4 (page 9)

Table 10Test Cases in the web experiments

Test Case Evaluation Purpose Query Type Travel Movie Music BookTC1 On-the-fly Recommendations (Baseline) Q1 X X X X

TC2 Constraint-based Recommendations Q3 X X X X

TC3 Advanced Queries (Rollup) Q5 Q6 X X X X

TC4 Advanced Queries (Cross-Domain) Q7 x X X X

from both genders with a slight overrepresentation ofmale participants (537) The prevailing number ofstudy subjects was between 20 and 40 years of age(658) The evaluation of the demographics sectionalso revealed that the majority of participants alreadyperceives the search in their domain of interest as ei-ther ldquomanageablerdquo (385) or even as ldquoeasyrdquo or ldquoveryeasyrdquo (440)Prior to analyzing the performance results of the novelretrieval approaches of the SKOSRec engine it waschecked whether the respective Likert items for Per-ceived Usefulness and user-based diversity (divU)reached acceptable reliability levels For this purposeCronbachrsquos α values were determined Table 11 showsthe outcome of this analysis for the Perceived Useful-ness construct Given the acceptable values through-out the domains (Cronbachsrsquo s α gt 07) it could betreated as an interval-scaled variable in the statisticaltests

Table 11Cronbachrsquos α values for the concept of Perceived Usefulness

Domain Cronbach αTravel 07002

Movie 08579

Music 08515

Book 08949

Besides usefulness user-based diversity scores werealso measured by multiple Likert items However

Cronbachrsquos α values for this concept did not reach ac-ceptable levels In order to avoid that valuable user as-sessments remained unused agreements to the state-ment bdquoThe recommendations of list [] are diverse ldquowere taken into account as an ordinal variable (divU)Based on these results subsequent tests could be car-ried out The first part of the analysis concerned theperformance of the on-the-fly retrieval method (base-line) of the SKOSRec engine in TC1 The outcomeof this analysis confirms that the on-the-fly approachrepresents a viable recommendation strategy Table 12shows the mean (M)12 standard deviation (S D) andnumber of participants (N) for each performance di-mension and metric It can be seen that users assessedthe enginersquos performance in almost each quality di-mension to be above mediocrity Aside from the posi-tive user evaluations with regard to Perceived Uselful-ness related quality aspects such as Accuracy Noveltyand Diversity received good assessments as well Forinstance each domain mean relevance score (mrs) washigher than 50 score points It means that participantswere often convinced of the relevance of the resultsGiven these positive user assessments with regard toitem relevance it seems to be justified to consider themean of result list sizes as an indicator of recall Theengine was able to generate the required number ofrecommendations in almost each test case (see Table

12In case of the user-based diversity score (divU) M denotes themedian because of the ordinal scale of the variable

L Wenige et al Similarity-based Graph Queries 21

12) The on-the-fly retrieval approach was also testedwith regard to the Novelty dimension Table 12 showsthe mean novelty scores (nv) for the baseline recom-mendation approach Novelty scores indicate that themajority of items in a result list were deemed to be newin the travel and book RS whereas in the experimentson movie and music RS only one third of the recom-mendation list was unfamiliar to users In contrast tothat content-based Diversity scores (divC) were ratherhigh in each domain Hence it can be concluded thatitems in a result set usually have a low rate of match-ing SKOS annotations In addition to content-basedscores user assessments of recommendation list Di-versity (divU) were also taken into account The me-dian level of agreement to the Diversity-related Lik-ert item stating that recommendations are diverse liedat 3 (ldquoneutralrdquo) in the multimedia domains and at 4(ldquoagreerdquo) in the travel domainIn summary the baseline approach achieved fairlygood performance results in TC1 The findings demon-strate that LOD-enabled recommendations can po-tentially enhance a userrsquos search experience Theyalso allow for subsequent tests of advanced retrievaltechniques which can be compared to this standardmethod

In the analysis of performance results from TC2(constraint-based recommendations) particular atten-tion was paid to participantsrsquo agreements with thestatement ldquoThe filter has improved the recommenda-tion results of the listrdquo Fig 8 shows the distribution ofdomain-wise agreement ratios On average they werefairly high Throughout the domains the vast majorityof participants either selected 5 (ldquostrongly agreerdquo) 4(ldquoagreerdquo) or 3 (ldquoneutralrdquo) Hence a filter often had apositive impact on results

The authors gathered responses to the questionon improvement for recommendation lists resultingfrom both the simple and the expressive filtering ap-proaches Hence they give clues about the generalperformance of constraint-based retrieval Howeversince this section of the study revealed the experimen-tal condition to participants (ie application of a fil-ter) answers to the Likert statement have to be inter-preted cautiously Results may be slightly biased asparticipants were able to guess the underlying agendaHowever these limitations do not exist for compar-isons between simple and expressive filters Since theweb application randomly assigned the list order par-ticipants did not know which approach they were as-sessing This enables us to take a closer look at the dif-ferences between the two filtering approaches It was

Table 12Performance results in TC1 (M = Mean S D = Standard DeviationN = No of participants)

Dimension (Metric)[Max Score]

Domain Results

M SD N

Accuracy (size) [10]

Travel 1000 000 103Movie 1000 000 50Music 1000 000 53Book 980 140 51

Accuracy (mrs) [1]

Travel 5649 000 103Movie 6225 1915 50Music 5570 010 53Book 5747 1555 50

Novelty (nv) [1]

Travel 060 034 101Movie 036 031 50Music 036 039 52Book 061 029 47

Diversity (divU) [5]

Travel 4 - 102Movie 3 - 50Music 3 - 53Book 3 - 50

Diversity (divC) [1]

Travel 080 016 103Movie 087 012 50Music 086 010 50Book 092 008 50

Usefulness [5]

Travel 334 085 103Movie 355 073 50Music 318 085 53Book 343 081 50

hypothesized that advanced filters improve recommen-dation quality as they may identify more relevant re-sources thereby overcoming problems of data qualityor data sparsity in LOD repositories Table 13 lists re-sponse rates of the two methods The rate measuresthe ratio of non-zero result sets among all result setsThroughout the domains recommendation lists result-ing from expressive filtering achieved higher responserates

Table 13Response Rates (TC2)

Domain Constr-based(Regular)

Constr-based(Expanded)

N

Travel 23 57 103

Movie 86 88 50

Music 72 75 53

Book 73 76 51

22 L Wenige et al Similarity-based Graph Queries

000

025

050

075

100

Digital Library Travel Movie Music Book

Domain

Per

cent

age

of A

gree

men

t

Degree of Agreement

1

2

3

4

5

The filter has improved the recommendations of the list

Fig 8 Participantsrsquo agreement with the statement that the filter has improved the result list (domain-wise) (TC2)

In addition to response rates user assessments andscores in the defined quality dimensions were com-pared Table 14 depicts the outcome of this analysisSignificantly higher scores are marked in bold figuresIn case statistical tests identified no meaningful varia-tions the results of the better performing approach areunderlined A-levels were adjusted with the false dis-covery rate (FDR) in order to decrease the probabilityof making a type I error due to multiple testing

According to mean scores expressive requestsgenerated more comprehensive recommendation lists(size) This finding is in line with the increased re-sponse rates shown in Table 13 However subse-quently conducted t-tests confirmed significant dif-ferences only for the travel experiment (t(102) =minus789 p lt 0001) In contrast precision scores werehigher in most of the domains (travel movie mu-sic) when the SKOSRec engine processed a simpleconstraint-based query However statistical tests didnot confirm any systematic differences The authorsapplied Wilcoxon tests for mrs scores and the remain-ing performance metrics because of the small samplesizes resulting from the low response rates of the sim-ple constraint-based approachRegarding Novelty none of the approaches was supe-rior In the Diversity dimension expressive filters gen-erated suggestions that were at least as topically di-versified as the results from simple constraint-basedretrieval Even though no significant differences were

identified between the two approaches increased meanscores (divU in the travel domain and divC in each do-main) indicate a slight superiority of expanded filterrequests in this quality dimension The same appliesto usefulness scores (mean values were higher in 3 outof 4 domains when expressive filtering was applied)However these statements are speculative at best be-cause statistical tests were not significantIn summary it can be concluded that expressive fil-ters improve recall potentially leading to a slight lossin precision scores (see Table 14) It may also be thecase that expanded user constraints diversify as wellas increase the usefulness of recommendation listsHowever these claims are unverified In contrast itis safe to say that expressive query patterns generateresults of at least the same quality as simple patternswhile increasing the number of relevant suggestionsThe few differences may be explained by the high Jac-card scores (JI) (see Table 15) of recommendation listsThroughout the domains result sets were much alikeGiven these similarities and the increases in list sizesfor expressive filters it is supposed that an expandedconstraint produces an enhanced version of the recom-mendation list resulting from simple filtering There-fore expanded filters should be the default setting in aconstraint-based retrieval context

In TC3 (ie evaluation of roll-up query patterns)the SKOSRec engine almost always generated non-empty recommendation lists (see Table 16) In the

L Wenige et al Similarity-based Graph Queries 23

Table 14Performance results in TC2 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Simple Expressive N

M SD M SD

Accuracy (size) [10]

Travel 117 270 452 462 103Movie 672 408 720 392 50Music 596 464 636 451 53Book 459 450 500 451 51

Accuracy (mrs) [1]

Travel 6673 2051 6534 2236 23Movie 6095 2298 6029 2279 43Music 6377 1605 6296 1644 34Book 5617 1818 5681 1681 37

Novelty (nv) [1]

Travel 048 041 061 049 21Movie 044 036 055 080 42Music 055 039 052 037 38Book 077 034 076 033 33

Diversity (divU) [5]

Travel 3 - 35 - 24Movie 4 - 4 - 43Music 4 - 4 - 38Book 3 - 3 - 37

Diversity (divC) [1]

Travel 068 018 072 021 19Movie 070 024 073 023 41Music 086 013 087 012 34Book 084 024 095 063 29

Usefulness [5]

Travel 333 093 350 077 24Movie 332 085 331 086 43Music 355 081 364 078 38Book 339 083 362 121 37

Table 15Jaccard Indices (TC2)

Domain Mean Jaccard Index (JI) SD N

Travel 057 047 25

Movie 071 038 43

Music 082 035 37

Book 092 026 35

travel and the book experiment the response rate washigher for aggregation-based queries whereas in themovie domain the engine more often provided recom-mendations for regular requests However these dif-ferences are only marginal and do not indicate a clearsuperiority of one method

Fig 9 depicts participants general agreement to Lik-ert statements concerning the Perceived Usefulness ofthe approaches in TC3 The diagram shows similarresults for regular as well as advanced requests Onaverage satisfaction scores reached levels which layslightly above neutral agreement A subsequent t-test

Table 16Response Rates (TC3)

Domain Regular Aggr-based N

Travel 99 100 103

Movie 96 92 50

Music 100 100 53

Book 96 100 51

confirmed no significant differences between the twomethods In fact the approach with the highest aver-age score of Perceived Usefulness was different in eachexperiment While mean values were higher for regu-lar requests in the studies on travel and music RS ad-vanced queries produced higher scores in the movieand book experiments

Content-based Diversity (divC) was the only met-ric with systematic differences Here scores went upfor advanced retrieval patterns in the travel (t(98) =minus550 p lt 0001) the music (t(52) = minus451 p lt

0001) and the book domain (t(47) = minus490 p lt

24 L Wenige et al Similarity-based Graph Queries

0

1

2

3

Travel Movie Music Book

Domain

Agr

eem

ent

Method

Regular

Rollminusup

Perceived Usefulness (Regular vs Rollminusup Queries)

Fig 9 Participantrsquos overall satisfaction with regular and roll-up recommendation retrieval (TC3)

0001) (see Table 17) This finding is not surprisinggiven that the aggregation-based approach explores thewider semantic network of each item in the profile

Hence it can be assumed that both approaches pro-duce recommendation lists of similar quality and canprovide benefits to potential consumers Thus it can-not be clearly stated which approach is superior be-cause neither of them outperformed the other in eachdomain While the regular retrieval strategy was bet-ter in terms of usefulness in the travel and the mu-sic domains the aggregation-based approach achievedhigher scores in the domains of movie and book rec-ommendationsThis conclusion is especially impressive given thelow mean Jaccard scores for recommendation lists re-sulting from regular and aggregation-based retrieval(see Table 18) The mean score never exceeded 01

throughout the domains It indicates a low concor-dance between result sets It is remarkable that par-ticipants perceived the recommendations as equallygood given the high dissimilarity of the lists Thusadvanced queries have an added-value as they provideadditional interesting results They can help to exploreLOD repositories in different ways than a regular re-trieval approach Therefore it is worthwhile to offeraggregation-based query patterns as an alternative re-trieval strategy when the regular approach has not pro-duced any helpful recommendations

In addition to TC3 user assessments for TC4 (cross-domain recommendations) were also evaluated How-ever since the authors conducted experiments for TC4only in the multimedia domains the samples were con-siderably smaller Nevertheless it is assumed that theamount of completed user sessions is only just enoughto conduct further statistical analyses Table 19 showsthe response rates for the two retrieval methods Cross-domain patterns had a higher success rate of generat-ing non-zero result sets throughout the domains Par-ticipants almost always received a recommendation forthis query type whereas in case a regular SPARQLquery was executed often they did not receive a singlesuggestion at all

We applied non-parametric tests for the perfor-mance metrics mrs nv divU divC and PerceivedUsefulness due to the low response rates of theSPARQL-based approach which led to a low num-ber of comparable data points accordingly Overall theFDR-adjusted pairwise Wilcoxon tests detected no sig-nificant differences except for user-based Diversityscores (divU) in the movie (Z = minus269 p lt 005) andthe book domain (Z = minus256 p lt 005) The resultsfor the content-based Diversity (divC) metric seem topoint in the same direction because of the high meanvalues for cross-domain queries in each domain Butthe significance of these results was not verified Themrs scores indicate a better performance of regular

L Wenige et al Similarity-based Graph Queries 25

Table 17Performance results in TC3 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular Aggr-based N

M SD M SD

Accuracy (size) [10]

Travel 990 099 942 207 103Movie 288 059 276 082 50Music 1000 000 1000 000 53Book 288 059 300 000 51

Accuracy (mrs) [1]

Travel 4821 2498 4493 2559 102Movie 5458 2288 5669 2542 44Music 5395 1831 5824 1619 52Book 5153 2390 4925 2230 49

Novelty (nv) [1]

Travel 043 039 038 040 98Movie 035 038 028 041 41Music 049 036 046 039 50Book 061 042 062 044 47

Diversity (divU) [5]

Travel 35 - 3 - 100Movie 4 - 3 - 43Music 4 - 3 - 50Book 3 - 3 - 48

Diversity (divC) [1]

Travel 079 017 089 013 99Movie 075 020 066 027 44Music 083 017 093 007 53Book 074 027 093 006 48

Usefulness [5]

Travel 336 077 322 074 101Movie 320 092 326 094 44Music 345 084 344 085 51Book 295 087 303 096 48

Table 18Jaccard Indices (TC3)

Domain Mean Jaccard Index (JI) SD N

Travel 007 011 101

Movie 003 007 44

Music 002 005 53

Book 002 006 50

Table 19Response rates (TC4)

Domain Regular (SPARQL) SKOSRec N

Movie 32 96 50

Music 62 98 53

Book 53 100 51

SPARQL queries but the within-subjects test did notconfirm this assumption statistically (see Table 20)

However the most important finding of the anal-ysis was that SKOSRec cross-domain queries gener-ated significantly more suggestions throughout the do-

mains (movie t(49) = minus1792 p lt 0001 mu-sic t(52) = minus2573 p lt 0001 book t(50) =minus1275 p lt 0001) (see Table 20) Fig 10 illustratesthis graphically

Whenever the SPARQL request provided recom-mendations which was the case in less than half ofthe user sessions (see Table 19) the SKOSRec sug-gestions were assessed to be of almost equal qualityas the SPARQL recommendations (see Table 20 Use-fulness) Hence when SPARQL querying did not pro-duce any results in more than the other half of the testcases the SKOSRec approach may still have provideduseful suggestions

Therefore it is reasonable to assume that the capa-bility of the SKOSRec engine to flexibly switch be-tween processing steps of similarity calculation andpattern matching can facilitate an improved explo-ration of LOD repositories Table 21 also shows thatthe items in the two sets often did not match as isindicated by low Jaccard indices This finding furtherdemonstrates that the execution of an additional step

26 L Wenige et al Similarity-based Graph Queries

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

00

25

50

75

100

Movie Music Book

Domain

No

of i

tem

s

Method

Regular

CrossminusDomain

Recommendation List Size

Fig 10 Mean recommendation list sizes for SPARQL and SKOSRec cross-domain requests (TC4)

L Wenige et al Similarity-based Graph Queries 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

of similarity calculation can help to retrieve other rel-evant items

5 Future Work amp Conclusion

This paper has presented the query-based SKOS-Rec engine that facilitates new types of recommen-dation requests for LOD repositories The evaluationshave shown that the SKOSRec system can often gener-ate relevant and useful suggestions when certain querytemplates are utilized In many cases the applicationof expressive or advanced query patterns (ie in thecontext of constraint-based retrieval or cross-domainrequests) helped to significantly improve recall valueswhile still providing the same level of quality in theremaining performance dimensions Additionally ad-vanced recommendation requests can be used to gener-ate diversified recommendation lists in case users arenot satisfied with the results of regular recommenda-tion requests (ie roll-up or cross-domain retrieval vsregular requests) While not being superior in each do-main and test case is at least safe to say that the novelrecommendation approaches of the SKOSRec engineconsiderably extend common retrieval methods and atleast provide an alternative search strategy when con-ventional methods fail to provide useful results Addi-tionally the increase in diversity and recall is in linewith findings from previous research on LOD-enabledRS [45]It is a strength of the developed approaches that theyfacilitate combinations of graph-based and similarity-based retrieval at different stages of the recommenda-tion workflow This feature extends general search ca-pabilities for semantic networks It requires future re-search to investigate whether in addition to SKOS fur-ther RDF vocabularies can be used for ad-hoc item-to-item similarity computation to increase the range ofpotential usage scenarios since not every LOD datasetcontains SKOS annotations [53] The application ofadditional properties could also boost the recommen-dation quality of some query patterns (eg such as thequery types used in TC3) In this context it also needsto be determined whether the representation of the user

profile in the SKOSRec query language can be substi-tuted with free-text expressions to enable search-likefunctionalities LOD researchers have already devel-oped engines that can process natural language queriesover RDF data [59] [34] [67] It will have to be inves-tigated how the SKOSRec engine profitably fits intothis landscape of existing retrieval toolsAnother open research question concerns the aspect ofinterfaces to other applications In its standard config-uration the system is not an end-user retrieval enginebut a backend application with which an administra-tor who has domain knowledge (eg of RDF vocab-ularies and data models of a particular LOD reposi-tory and usage scenario) creates the respective querypatterns Although the websites of the online experi-ments represent possible implementations of suitableend-user applications for the SKOSRec engine thereare still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selec-tion of LOD items for the profile was made possibleby previously storing the data in an index that couldthen be accessed and searched before issuing a recom-mendation query However this runs against the ideaof ad-hoc retrieval In this context future research willhave to determine whether it is feasible to generate anon-the-fly mapping from items to the respective LODresources In the case of a commercial application theengine needs to match items with the correspondingentries in the product catalog Even if it is not yet clearwhich UI components can make the best possible useof the SKOSRec enginersquos features the results from theuser experiments can give at least a few suggestionsThe following retrieval options should be the defaultsetting in an interface that contains a LOD-enabledRS component to assist users in finding interestingitems Since the regular approach of on-the-fly re-trieval achieved good results throughout the domainsthe display of simple recommendations generated frompreviously stated user preferences represents a suitablestarting point for retrieval On-the-fly queries could berefined by providing filter options Since the evalua-tions have shown that expressive constraints often leadto increased recall values the application of such afilter might be the best option for assisted retrievalAs in the preparations for the experiments appro-priate graph-based query patterns for the usage sce-nario in question would also have to be determinedby a domain expert before setting up the applicationThe same applies to advanced retrieval patterns whichshould be available to the end user to refine the querywhen both the simple as well as the constraint-based

28 L Wenige et al Similarity-based Graph Queries

approach have failed to provide relevant items Thissuggestion is made because roll-up queries were com-petitive with regular recommendations in the web-based experiments Therefore they represent a viablealternative retrieval strategy For some domains (egfor multimedia retrieval) it is also possible to usean optional selection field to enable the formulationof cross-domain requests Evaluations for this querytype suggest that users might receive interesting results(ie diversified and comprehensive recommendationlists) from such a requestIn summary the contributions of the paper are as fol-lows

ndash Development and implementation of novel re-trieval strategies in the SKOSRec system whichfacilitate personalized requests and leverage thepotential of knowledge graphs in order to addressthe research gap in state-of-the-art recommenda-tion methods with regard to customized queries

ndash Realization of web-based user experiments infour usage scenarios with the following out-comes

bull Successful demonstration that knowledge graphdata is applicable in the context of live recom-mendation scenarios

bull Creation of suitable query patterns that canbe applied to represent individual informationneeds in different application contexts The setof query patterns can be extended with the helpof the SKOSRec query language to meet therequirements of additional usage scenarios

bull Proof that the novel retrieval strategies can besuccessfully applied across various applicationcontexts and recommendation tasks and thatthey outperform conventional content-basedretrieval approaches especially in the perfor-mance dimensions of diversity and recall

The prototypical implementation and the evaluationof the SKOSRec engine in different usage scenarioshave demonstrated that the freely available knowledgesources on the LOD cloud can be successfully appliedto advance state-of-the-art methods in IR and recom-mender systems The proposed personalized and cus-tomized retrieval strategies can leverage the potentialof knowledge graphs by better matching informationneeds of users with the help of semantic search op-tions

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be com-prised of up to six elements of which the partsSimProjection and ItemPart are obligatory Henceusers need to state at least their preferencesand how many recommendations they wouldlike to receive Advanced retrieval patterns canbe formulated as well For instance similar re-source retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (Se-lectPart) of LOD resources As in SPARQL thequery can start with a prologue that abbreviatesIRIs with namespace declarations [21]

SelectPart The SelectPart part enables subqueryingwith recommendation results The surroundingquery is a SELECT query with a slightly simpli-fied WHERE clause (ie RecWhereClause) withwhich postfilter conditions can be formulatedIn this section users have the option to formu-late aggregation-based retrieval requests (Aggre-gation)

Aggregation This query part handles aggregation-based postfiltering Users specify the IRI (IRIref )resource for which similarity scores of sublevelentities are summarized Additionally it is statedhow the data should be aggregated (ie SUMMAX or AVG) as well as which variable (Var)represents the aggregation-based recommenda-tions in the postfilter section The compiler makessure that this variable is actually contained in theRecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refersto the list of generated suggestions It definesthe recommendation variable to which all simi-lar LOD resources are mapped In case pre- andpostfilter conditions are set in the other sectionsthe compiler checks whether the variable (Var)appears in these sections as well Otherwise therequired join operations cannot be executed Itis specified how many recommendations shouldbe displayed (Integer) and whether the requestwill be issued against a default repository or as across-repository request (ServiceIntegration)

ServiceIntegration This section indicates that a userintends to receive recommendations from a dif-ferent SPARQL endpoint than the default end-point that is stated in the standard configurationof the SKOSRec engine It specifies the targetSPARQL endpoint (IRIref ) from which a user

L Wenige et al Similarity-based Graph Queries 29

wants to receive recommendations Upon extrac-tion of SKOS annotations from the default sourceendpoint a subsequent request is sent to the spec-ified target endpoint

ItemPart This section of a SKOSRec query repre-sents a single preference in the user profile How-ever recommendation queries can contain a cou-ple of preference statements A preference is ei-ther expressed as a LOD resource (IRIref ) or asa variable which is bound to a preference query(VarPart) In the latter case the resources areobtained by matching the graph pattern in theVarPart with the triple statements in the reposi-tory The ItemPart statement can be additionallysupplemented with a concept-to-concept similar-ity threshold (Sim) that triggers concept expan-sion with the approach of flexible similarity de-tection which is extensively described in one ofour previous works [64]

VarPart The construct initiates preference queryingIndividual likings are expressed as a graph pat-tern in a RecWhereClause The variable (Var) isa placeholder for the LOD resources that will beused to set up the user profile The compiler hasto make sure that the specified variable occurs inthe RecWhereClause as well Otherwise the ex-traction of LOD resources cannot be carried outcorrectly

Sim The Sim part of a SKOSRec query denotes thethreshold of concept-to-concept scores for flexi-ble similarity detection Even though knowledge-based similarity values usually range between 0and 1 the specification allows a broader range ofdigits in the Decimal datatype to enable applica-tion of other similarity metrics in case they areneeded

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users canstate whether scores should be ldquolargerrdquo ldquolargerthanrdquo or ldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three dif-ferent sections of a SKOSRec query ie either inthe prefilter (ItemPart) the postfilter (SelectPart)or the preference filter section (VarPart) TheRecWhereClause closely resembles the ldquoWhere-Clauserdquo of the SPARQL 11 specification withall its subsequent parts [21] Hence it enablesdifferent combinations of graph pattern match-ing However the RecWhereClause of the SKOS-Rec language allows fewer pattern matching ex-pressions than the ldquoWhereClauserdquo of the regular

SPARQL syntax specification [21] For instanceit is neither permitted to formulate subqueries norto direct filter requests to more than one reposi-tory (see RecGraphPatternNotTriples for furtherexplanations) This modification has been madeto simplify similarity calculation After parsingthis section the compiler makes sure that recom-mendation preference or postfilter variables oc-cur at least once in the RecWhereClause to guar-antee that subsequent steps of the recommenda-tion workflow can be applied on existing LOD re-sources In case the RecWhereClause comprises aRecMinusGraphPattern it also has to be checkedthat the variable is not only contained in theMINUS part of the group

RecGroupGraphPattern A RecGroupGraphPatterncan contain one to many basic graph patternswhich can be differently combined according toRecGroupGraphPatternSub

RecGroupGraphPatternSub This section defines thegraph patterns that are applied to identify suit-able LOD resources User filters can be either ex-pressed as basic graph patterns (TriplesBlock) oras combinations of them (RecGraphPatternNot-Triples) The sequence of different kinds of pat-terns can be flexibly specified

RecGraphPatternNotTriples This query part closelyresembles the similar named ldquoGraphPatternNot-Triplesrdquo of the SPARQL 11 specification [21]The only difference to SPARQL is that the Rec-GraphPatternNotTriples section is a little morerestricted in terms of admissible graph patternsthan the ldquoGraphPatternNotTriplesrdquo section of theregular SPARQL specification For instance itdoes not allow to retrieve LOD resources otherthan from the default graph that is specified in theconfiguration of the SKOSRec engine since filterpatterns have to be applied to datasets that containSKOS annotations Hence they need to be speci-fied before runtime execution to ensure that sim-ilarity calculation can be carried out correctly Ifa user was able to formulate filter conditions thatreferred to different RDF graphs and endpointsaccordingly this would not be possible

RecGroupOrUnionGraphPattern This query sec-tion marks an alternative graph pattern in the styleof the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntaxspecification [21] However it neither allowsSPARQL-like subqueries nor federated queries

30 L Wenige et al Similarity-based Graph Queries

for filtered resources because this would compli-cate similarity calculation

RecOptionalGraphPattern In this part of a SKOS-RecQuery users are enabled to specify additionalgraph patterns which might extend the query so-lution but do not necessarily have to match thedata

RecMinusGraphPattern This section handles exclu-sion of LOD resources for cases when users wantto omit certain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

L Wenige et al Similarity-based Graph Queries 31

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM A

mobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige J Ruhland Flexible on-the-fly recommendationsfrom Linked Open Data repositories In International Confer-ence on Business Information Systems pp 43ndash54 2016

[63] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[64] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[65] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[66] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[67] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Future Work amp Conclusion
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 8: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge … · Semantic Web 0 (0) 1 1 IOS Press Similarity-based Knowledge Graph Queries for Recommendation Retrieval Lisa Wenige*,

8 L Wenige et al Similarity-based Graph Queries

AD and are contained in the solution mappings result-ing from the evaluation of a specified graph pattern(Ppre f ) over a dataset D (Eq 9)

Pr = micro(r) | micro isin [[Ppre f ]]D r isin dom(micro) r isin AD

(9)

34 Prefiltering

To facilitate exploratory search in LOD reposito-ries users should also have the option to formulateconditions in their queries A possible way to enablefilters during retrieval is before similarity calculationThus recommendation lists can be adjusted to individ-ual needsDue to the expressiveness of RDF data filter condi-tions can not only be applied on attributes that are di-rectly connected to potentially relevant items but canalso be declared as a graph pattern By this means it ispossible to retrieve LOD resources that are both simi-lar to the user profile as well as fulfill an advanced con-ditionThe strengths of the approach will be illustrated with atravel recommendation request that was executed overDBpedia Suppose a customer expressed a preferencefor the Lake Baikal region (eg as indicated by a pos-itive rating on a travel community site) With simpleon-the-fly retrieval she would receive a recommen-dation list that primarily consists of travel destinationsthat are located in the proximity to this geographicalarea A graph-based filter on the other hand facili-tates advanced retrieval options In the given exampleit enables the user to obtain suggestions for SoutheastAsian travel destinations that have similar features asthe Lake Baikal region even though the resource dbrLake_Baikal is not directly connected to any desti-nation with the annotation dbcSoutheast_Asia butcan only be identified by exploring the surroundinggraph structure of the filter attributeThe advanced constraint ensures that the generatedresult set is not empty A simple attribute-level fil-ter which is commonly utilized in faceted search sys-tems would not generate any search hits By thesemeans graph-based filter patterns can help users tobetter explore knowledge graphs thereby overcomingdata quality issues in LOD repositories The corre-sponding SKOSRec query to this example is shown inListing 5

Listing 5 SKOSRec query with an expressive pre-filter condition to retrieve Southeast Asian destinationsbased on a preference for the Lake Baikal region (Q3)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbc lthttpdbpediaorgresourceCategorygtPREFIX dbo lthttpdbpediaorgontologygtPREFIX dct lthttppurlorgdctermsgtPREFIX skos lthttpwwww3org200402skoscoregt

RECOMMEND destination TOP 5PREF dbrLake_Baikal[ WHERE destination dbocountry country

country dctsubject countrySubjectcountrySubject skosbroader dbcSoutheast_Asiadestination rdftype dboPlace

]

Table 34 shows the solution to this query It con-tains travel destinations and regions of interest thatare both similar to the previously liked destinationdbrLake_Baikal and are located in Southeast AsiaEcoregions around lakes rivers or mountains fromSoutheast Asian countries such as Cambodia Vietnamor Malaysia are presented to the user

Table 4Result set of Q3

destinationdbrTonle_SapdbrMekongdbrMount_KinabaludbrLaguna_de_BaydbrLake_Toba

The technical details for this type of query are givenin the following definitions Before the extraction ofrelevant resources and annotations the recommenda-tion engine identifies the resources that satisfy the fil-ter (Definition 10)

Definition 10 (Prefiltered resources) The mapping offiltered resources Ωpre f ilter is obtained by matching aspecified graph pattern (P) to the RDF graph of adataset D (Eq 10)

Ωpre f ilter = micro|q | micro isin [[P]]D (10)

After preselection resources are brought togetherwith user profile data to retrieve the final set of recom-mendations (Definition 11)

Definition 11 (Annotations and relevant resources(prefiltered retrieval)) The solution mappings of an-notations and relevant resources in prefiltered retrievalmode is obtained by joining the set of prefiltered re-

L Wenige et al Similarity-based Graph Queries 9

sources with the mappings Ωr of all relevant resourcesand annotations as determined from the user profile

Ωpre = Ωpre f ilter Ωr (11)

After retrieving annotations and relevant resourcesin prefiltering mode the system generates constraint-based recommendations In this context the rankingprocedure is adapted to the restricted set of relevantresources and annotations Hence IC values of match-ing annotations and respective SKOS similarities arequantified according to the user filter By this meansrecommendation scores reflect the similarity of itemsfor the actual set of filtered LOD resources Except forthis restriction the system carries out the calculations(ie determination of similarity values the ranking ofLOD resources) in the same way as in non-filteringmode

35 Postfiltering amp Combinations

The SKOSRec engine cannot only combine simi-larity calculation and graph pattern matching to pre-filter relevant LOD resources it can also apply userconstraints after similar items have already been de-termined By this means recommendations are filteredex-post to the similarity detection process This fea-ture enables novel retrieval patterns in which sugges-tions are part of a subquery As an example scenariofor a postfilter recommendation task consider the fol-lowing SKOSRec query which obtains suggestions formovies and directors based on a userrsquos preference forthe director dbrQuentin_Tarantino (see Listing6) For the given request the engine would identifysimilar directors based on Quentin Tarantinorsquos char-acteristic features (eg geographic origin or awardswon) and would determine the movies these similar di-rectors have shot Thus users may receive interestingmovie recommendations

Table 5 shows the solution the SKOSRec system re-turns when Q4 is issued against DBpedia The ellipsesin the displayed tables indicate records that have beenexcluded for brevity reasons

However this approach can still be improved uponFor instance the engine only considers the metadatadescriptions of the director whereas related movieshave to be retrieved anew upon execution of the post-filter section Thus they are not ranked according tosimilarity in the final output tableAnother weakness is the biased semantics of the query

Listing 6 SKOSRec query to generate simple post-filtered recommendations in the movie domain (Q4)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX yago lthttpdbpediaorgclassyagogt

SELECT director movieWHERE movie dbodirector director director rdftype yagoDirector110014939 LIMIT 50RECOMMEND director TOP 5PREF dbrQuentin_Tarantino

Table 5Result set of Q4

director moviedbrKirk_Douglas dbrScalawag_(film)dbrKirk_Douglas dbrPosse_(1975_film)dbrKevin_Costner dbrThe_Postman_(film)dbrKevin_Costner dbrDances_with_Wolves dbrGene_Kelly dbrHello_Dolly_(film)dbrGene_Kelly dbrInvitation_to_the_Dance_(film) dbrAlan_Alda dbrGoodbye_Farewell_and_AmendbrAlan_Alda dbrMargaretrsquo_s_Engagement dbrSteve_Buscemi dbrInterview_(2007_film)dbrSteve_Buscemi dbrAnimal_Factory

Usually when a person favors a movie director thispreference refers to the movies this director has shotand not to the personal features of the person Two di-rectors might be similar according to certain featuressuch as birth year nationality or received awards butthe style and genre of the movies they have created canstill be different A third factor is that recommenda-tion retrieval for movie directors should also take intoaccount that a film artist is likely more relevant themore movies she has shot that are in line with the pre-ferred movies of the user Hence the SKOSRec engineenables another postfiltering strategy which is calledaggregation-based retrieval This strategy can be help-ful whenever an entity type is linked to a couple ofLOD resources in a dataset such that similarity scoresof related entities (eg movies) can be aggregated toan overall recommendation score for a certain item(eg for a movie director)Aggregation-based requests can be well combinedwith other retrieval patterns such as preference queries

10 L Wenige et al Similarity-based Graph Queries

In the case of the movie recommendation scenario auser could state that she enjoyed Quentin Tarantinomovies in the past and would like to receive sugges-tions for directors that shot similar film Usually thiskind of question can typically only be answered byanother human being eg in a personal interactionamong friends or in an online forum The fact that thelanguage facilitates these kinds of queries is one of itskey advantages

Listing 7 SKOSRec query to generate post-filteredrecommendations based on a preference query in themovie domain (Q5)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX yago lthttpdbpediaorgclassyagogt

SELECT director movieWHERE movie dbodirector director director rdftype yagoDirector110014939 LIMIT 50AGG dbrQuentin_Tarantino director SUMRECOMMEND movie TOP 100PREF [ prefMovieWHERE prefMovie dbodirector dbrQuentin_Tarantino ]

Table 6 depicts the results of Q5 as retrieved fromDBpedia When comparing the solutions sets for Q4and Q5 it becomes clear that they are totally dif-ferent While Q5 generated a list of directors whohave mostly created action and thriller movies with atwist (ie Tarantinorsquos unique filmmaking style5) Q4provided a completely different output It shows USAmerican actor-directors whose movies range frommusical films (dbrHello_Dolly_(film)) towestern movies (dbrDances_with_Wolves) ordrama films (dbrInterview_(2007_film))Thus the directors in the solution table of Q4 (Table5) seem rather arbitrary and not exceptionally helpfulconcerning the initial user request

We define this kind of aggregation-based request asa so-called rollup query This definition was chosen be-cause the final suggestions are based on summarized(ie rolled up) preferences for sublevel entities An-other example for a roll-up request stems from the ap-

5httpsenwikipediaorgwikiQuentin_Tarantino

Table 6Result set of Q5

director moviedbrSteven_Spielberg dbrIndiana_JonesdbrSteven_Spielberg dbrJurrasic_Park_(film) dbrFrancis_Ford_Coppola dbrThe_Godfather_Part_IIdbrFrancis_Ford_Coppola dbrThe_Godfather_Part_III dbrChristopher_Nolan dbrThe_Prestige_(film)dbrChristopher_Nolan dbrThe_Dark_Knight dbrRobert_Rodriguez dbrPlanet_TerrordbrRobert_Rodriguez dbrMachete_(film) dbrKevin_Smith dbrClerksdbrKevin_Smith dbrClerks_II

plication scenario of travel search This recommenda-tion query derives suggestions for city trip destinationsfrom the points of interest (POI) a user has visited andliked in another city (eg London) Listing 8 showsthe corresponding SKOSRec query for this scenario

Listing 8 SKOSRec query to generate post-filteredrecommendations (aggregation-based) for a preferenceprofile containing the city of London (Q6)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX dullthttpwwwontologydesignpatternsorggtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX rdfslthttpwwww3org200001rdf-schemagt

SELECT cityWHERE poi property city city rdftype dboCity property rdfssubPropertyOf dulhasLocation LIMIT 3AGG dbrLondon city SUMRECOMMEND poi TOP 100PREF dbrOxford_StreetdbrNotting_HilldbrSouth_Bank

The SKOSRec request contains subquery com-mands that refer to the userrsquos preferred POIs It spec-ifies how similarity scores of the generated sugges-tions should be aggregated (ie sum-based) The re-trieval is based on the 100 POIs that are most simi-lar to the ones in the preference section Additionallythe subquery part refers to the DBpedia resource thatrepresents London (AGG dbrLondon city) thusindicating that the list of recommendations should notcontain any POIs in this city Upon processing the de-

L Wenige et al Similarity-based Graph Queries 11

picted query over the DBpedia repository the enginedisplays the three cities for whom the highest aggre-gated scores have been detected (see Listing 7)

Table 7Result set of Q6

citydbrOxforddbrAbingdon-on-ThamesdbrWallingford_Oxford

The motivation for formulating such a query is thatwhen searching for cities preferences might be betterrepresented through the sights a user has visited else-where rather than through a simple on-the-fly requestsolely containing a preference statement for a city inthe user profile sectionBesides the above presented roll-up retrieval patternsthe SKOSRec grammar may facilitate the formula-tion of additional types of advanced recommendationqueries For instance cross-domain queries are an in-teresting additional pattern to be explored for personal-ized retrieval Listing 9 shows an example request thatgenerates movie suggestions based on the preferencefor a music band (ie dbrThe_Jackson_5)

Listing 9 SKOSRec query to generate cross-domainrecommendations (Q7)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgt

SELECT movie participated musicalActWHERE movie rdftype dboFilm VALUES dbobasedOn dbomusicComposerdbogenre dbostarring musicalAct participated movie UNION movie participated musicalAct LIMIT 10AGG dbrThe_Jackson_5 movie MAXRECOMMEND musicalAct TOP 100PREF dbrThe_Jackson_5WHERE VALUES actType dboMusicalArtist dboBand musicalAct rdftype actType

The query contains a prefilter condition that trig-gers retrieval of music artists and bands Afterwardsthe scores of the top 100 suggestions are summarizedwith maximum-based aggregation Thus the highestsimilarity value among the set of music acts deter-

Table 8Result set of Q7

movie musicalActdbrMoonwalker dbrMichael_Jackson dbrThe_Wiz_(film) dbrAshford_amp_SimpsondbrGarfield_Gets_a_Life dbrThe_TemptationsdbrPipe_Dreams_(1976_films) dbrGladys_Knight_amp_the_PipsdbrThe_Jacksons dbrJermanine_Jackson

mines the ranking score of the film Also note be-cause of the aggregation-based retrieval procedureconnections between movies and the LOD resourcedbrThe_Jackson_5 are excluded from similaritycalculation The scenario exemplifies how semanticrelationships can help to generate suggestions for a tar-get domain (ie movies) that are based on a profilecontaining items from a different source domain (iemusic acts) While regular SPARQL queries performexact matching of triple statements the SKOSRec syn-tax facilitates the integration of similarity- and graph-based retrieval to enrich result lists with other poten-tially relevant itemsAs a point of reference consider the SPARQL queryfor the given cross-domain scenario (Listing 10)

Listing 10 SPARQL query to generate cross-domainrecommendations (Q8)

PREFIX dbr lthttpdbpediaorgresourcegt PREFIX dbo lthttpdbpediaorgontologygt PREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgt

SELECT movie musicalActWHERE movie rdftype dboFilm VALUES dbobasedOn dbomusicComposerdbogenre dbostarring dbrThe_Jackson_5 participated movie UNION movie participated dbrThe_Jackson_5 LIMIT 5

The query omits the similarity calculation partThus it only retrieves movies linked to the resourcedbrThe_Jackson_5 By this means the SPARQLresults have a better chance of being related to the ini-tial user profile On the other hand empty result listscan occur when the preferred LOD resource does nothave any direct connections to recommendable itemsThis assumption is backed up by the result lists of thetwo queries (Tables 8 and 9) The SKOSRec query pro-duces a more comprehensive solution than a regularSPARQL query

12 L Wenige et al Similarity-based Graph Queries

Table 9Result set of Q8

movie musicalActdbrThe_Jacksons dbrThe_Jackson_5

The following of definitions will formalize the post-filtered and aggregation-based retrieval forms Defini-tion 12 specifies the notion of SKOSRec recommenda-tions that may potentially undergo postprocessing

Definition 12 (SKOSRec recommendations) SKOS-Rec recommendations (R(Pr)) are the set of sugges-tions generated by processing the solution mappings(Ω) obtained from recommendation retrieval Theycan be a result of simple on-the-fly retrieval prefilter-ing or a result of a combination of these techniques Prrepresents the input profile that contains at least onepreference for an item The cardinality of R(Pr) is theintended number of recommendations (k) that is spec-ified by the user based on which the engine selects theresources with the highest recommendation scores thatare not contained in the profile (Eq 12)

R(Pr) = x | x isin q | score(Pr q) gt kmaxqisinΩ

score(Pr q) q isin Pr

(12)

The approach of subquerying with recommendationresults builds on the idea that SKOSRec suggestionsR(Pr) can be joined with any graph pattern Ppost pastto the process of similarity calculation (Definition 13)It does not make a difference how the engine obtainedthese recommendations

Definition 13 (Postfiltered recommendation mapping)The postfiltered recommendation mapping is obtainedby joining solution mappings generated from SKOS-Rec recommendations with the results of matching apostfilter graph pattern (Ωpost) (Eqs 13 and 14)

Ωpost f ilter = micro | micro isin [[Ppost]]D x isin dom(micro)

(13)

Ωpost = micro|x | x isin R(Pr) Ωpost f ilter (14)

As the prefilter Ppost can be comprised of triplestatements that are eligible in the RecGroupGraph-

Pattern part of the SKOSRec grammar Aggregation-based queries have to be treated differently than a reg-ular postfilter request Apart from excluding sublevelitems of the profile from the set of similar resourcesthe engine should also omit any connections betweenthe preferred superordinate item (a) and similar sub-level entities Additionally the user has to specify thevariable in his profile (eg y) that represents the itemswithin the postfilter graph pattern Ppost By this meansaggregation-based recommendations can be generated(Definition 14)

Definition 14 (Resources for aggregation-based re-trieval) The set of resources for aggregation-based re-trieval is obtained by retrieving LOD resources thathave a connection to a previously generated sugges-tion thereby excluding all resources that are linked tothe specified superordinate item a in the profile (Eq15)

Ragg(a) = micro(y) | micro isin Ωpost y isin dom(micro) a isin micro(y) (15)

For each potential aggregation-based suggestion (y)similarity scores of connected entities (x) have to beconsidered for the final ranking In this context differ-ent approaches to aggregation can be applied In caseswhen both similar items and aggregation-based recom-mendations are entities on comparable levels of gran-ularity then choosing the maximum similarity scoremight be the best way to represent the relatedness ofthe resource to the user profile On the other handwhen postfiltered recommendations are based on sim-ilarity scores of their sublevel entities (eg as in themovie recommendation example) the scores of relatedsuggestions should be aggregated by the sum or theaverage of individual scores

Definition 15 (Aggregation-based ranking score) Theranking score of an LOD resource y that is connectedto a set of recommended items is determined by aggre-gating scores of connected entities either by the max-imum (Eq 16) the sum (Eq 17) or the average (Eq18) of individual scores

scoreaggMAX(y) = maxxisinmicro(xy)

score(Pr x) (16)

scoreaggS UM(y) =sum

xisinmicro(xy)

score(Pr x) (17)

L Wenige et al Similarity-based Graph Queries 13

scoreaggAVG(y) =

sumxisinmicro(xy) score(Pr x)

|x isin micro(x y)|(18)

4 Evaluation of the SKOS Recommender

41 Experimental Setup

The novel recommendation strategies of the SKOS-Rec engine were evaluated in the context of web-basedexperiments for the usage scenarios of travel destina-tion search and multimedia RS The online approachhelped to reach out to more participants with diversedemographic backgrounds which would not have beenpossible in a laboratory experiment Another positiveside-effect of the web-based setting was that partici-pants could test and rate recommendations without be-ing directly observed by an experimenter which mighthave caused biased results otherwise Hence the on-line approach softened common limitations of labora-tory studies [19]The experiments were carried out in the defined usagescenarios with metadata descriptions from DBpediaThe database containing the local mirror of DBpediaran on a virtual server Besides the triple store a webapplication was hosted on the same machine The web-site ran on an Apache Tomcat 7 application server andwas implemented with the help of Java servlet technol-ogy67

We conducted the online experiments between April2016 and January 2017 Upon setting up the web ap-plication a pretest was carried out to control for po-tential pitfalls such as misleading navigation Two testusers ran step by step through the web interface andprovided their feedback Because of their insights weslightly modified the interface (eg through rephras-ing user instructions) to increase the chances of suc-cessful completion In case the SKOSRec engine is ap-plied in a live setting the design of the user interfacewill have to be tested with more users Our experimen-tal series focused on algorithmic performance whichwas tested with numerous participantsSubjects were recruited through the clickworkercomplatform8 Participants received a compensation fee

6httptomcatapacheorg7httpwwworaclecomtechnetworkjavajavaeeservletindexhtml8httpswwwclickworkerde

of 100e or 150e for a completed survey 103 par-ticipants took part in the study on travel destinationsearch In the multimedia domains in total 154 sub-jects evaluated the SKOSRec engine For each partof the evaluation it was ensured that participants hadnot just run through the evaluation screen Wheneverthere appeared to be irregularities in the log files (egthe default settings of the screen were not changed)these assessments were excluded from further analy-sis The survey template started with a consent formIt informed participants about the context and the pro-cedure of the study After having filled in the consentform participants were navigated to the first part of theexperiment It collected data on subjectsrsquo demograph-ics and their consumption and search habits After thepreparation phase participants worked on the first testcase (TC1) in each usage scenario In this test case theperformance of the SKOSRec engine was assessed inthe simple execution mode of on-the-fly recommenda-tions (see Subsect 32) It was done to gather data onthe usefulness of suggestions resulting from a simplecontent-based approach (ie baseline method) withwhich more advanced recommendation strategies werecompared at a later stage In cases where users did notstate any items the application showed an error pageWhile setting up a profile required some effort on theside of the user it was a necessary step for the experi-ment Since we did not have access to the log files of alive application profile generation had to be simulatedThe process was facilitated by a specifically tailoredinterface (see Fig 2) where subjects could search foritems through an autocomplete field that applied AJAXtechnology9

It enabled users to type in keywords (eg the nameof a music act or the author of a book) for whichmatchings were instantaneously displayed withoutreloading the site Hence participants could quicklyfind and select their items of interest Whenever auser entered a query through the AJAX interfacethe system matched the query keywords against anApache SolrLucene search index10 The index wasbuilt with item metadata which had been extractedfrom DBpedia prior to setting up the website Theindex comprised the names and abstracts of itemsWhenever available information on thumbnail pic-tures were saved in the index to be ready for display inthe AJAX interface In case the Apache SolrLucene

9httpapijquerycomjqueryajax10httpluceneapacheorgsolr

14 L Wenige et al Similarity-based Graph Queries

Fig 2 User profile generation TC1 (music domain)

search engine found index keywords that matched theuser query metadata information for these items wereimmediately shown on the screen This informationwas displayed to help participants make an informeddecision on whether the AJAX result list containedsuitable items for profile generationAside from the AJAX feature the appeal of the in-terface was increased by a progress bar at the top ofthe webpage (see Fig 2) Progress bars are commonlyutilized tools for web surveys In conventional paper-based studies subjects usually see how far ahead theyare while in a web survey they cannot check theirprogress immediately This is especially true for stud-ies that apply a screen-by-screen navigation whichwas the chosen approach for the conducted web ex-periments Progress bars prevent users from tiring ofthe questionnaire and withdrawing from the study alto-gether by showing how close participants are to com-plete the experiment [14]Upon profile generation the SKOSRec engine calcu-lated on-the-fly recommendations for a simple query(similar to Q1 Sect 3) which were shown to usersin a separate screen Suggestions were displayed witha sufficient amount of information eg thumbnailslabels or a short description to help participants as-sess the quality of the recommendation In addition tothis information users could also access the respectiveWikipedia article by following the hyperlink on the la-bel Relevance sliders were used to assess the utility ofeach recommendation They are often applied in eval-uations of IR systems [51 54] Scores were handled aspoints on a 0 to 100 scale of the relevance slider (outerleft side lowest relevance outer right side highest rel-evance see Fig 3)

This approach facilitated the calculation of an accu-rate precision score and the application of tests withhigher statistical power than an ordinal 5-star scaleThe score called mean relevance (mrs) was measuredwith Equation 19

mrs =

ksumi=1

scorei

k(19)

In addition to the precision score the web applica-tion also determined the size of the recommendationlist produced by the engine as an indicator for recallHowever recommendation list size can only be uti-lized for this purpose when mrs scores reach a reason-able levelWhile accurate recommendations are useful becausethey build trust in a system [57] they only capture anarrow aspect of performance [36] For instance anonline retailer can tremendously profit from an RSthat provides both relevant and unobvious recommen-dations since it enables users to explore the prod-uct catalog more thoroughly On the other hand inthe case of an RS only suggesting popular productsit is likely that the user already knows them [3 23]A pure accuracy-based evaluation disregards these as-pects which is why other performance indicators suchas novelty were measured as well Thus the evalua-tion interface also contained a section where subjectshad to tick a radio button that indicated whether anitem was new In the backend of the web applicationnovelty was measured as the ratio of relevant items not

L Wenige et al Similarity-based Graph Queries 15

Fig 3 Recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

familiar to the user (|nov|) among the total numberof relevant items (|tp|) (Eq 20)

nv =|nov||tp|

(20)

An item was considered relevant when it achievedan mrs score of 50 or higher In addition to novelty anddiversity the web application also tracked the diversityof recommendation lists since a topically diversifiedresult set has been found to increase overall user satis-faction [31 66] In the RS literature the most widelyexplored diversification metrics are the ones that mea-

sure the similarity between each item pair in the rec-ommendation list The more the items are alike theless diverse is the result set [19] The study series ofthis paper applied the metric by Castells et al [8] Itdetermines item-to-item similarity values with the co-sine similarity measure Scores are then converted todistance values (Eq 21) and averaged for the recom-mendation list (Eq 22)

d(il im) = 1minus cos(il im) (21)

16 L Wenige et al Similarity-based Graph Queries

divC =2

k(k minus 1)

sumlltm

d(il im) (22)

The content-based diversity scores (divC) were cal-culated in the backend of the web application andsaved in a log file for each generated recommenda-tion list of the SKOSRec engine during runtime execu-tion In addition to the measurement of divC scores weasked participants about their opinions on recommen-dation list diversity directly Following the RS evalua-tion framework by Pu et al [44] two Likert items wereposed for this purpose

ndash The recommendations of list [] are diversendash The recommendations of list [] are similar to

each other

Besides the above listed algorithmic performancemetrics the authors gathered assessments from usersregarding the overall usefulness of the suggestionsThis approach is also in line with the empirically veri-fied evaluation framework by Pu et al [44] which pro-poses to measure the psychometric construct of Per-ceived Usefulness to get a comprehensive understand-ing of user opinions on recommendation results Thusparticipants were instructed to state their agreement topositive statements regarding the relevance of the sug-gestions The statements were adapated to the speci-ficities of each usage scenario Fig 4 shows an exam-ple collection of Likert questions for the domain ofmusic RS

After participants had provided the required infor-mation their answers were saved in a log file and theywere navigated to test case 2 (TC2) which evaluatedthe SKOSRec enginersquos performance for constraint-based queries (similar to Q3 Sect 3) We tried to findout if the engine was able to improve user satisfactionthrough filter conditions Even though IR systems al-ready successfully apply facet filters on result sets theexperiments still had to prove whether LOD-enabledconstraints have the same effect on the Perceived Use-fulness of recommendations For the comparisons ofnon-filtered and filtered suggestions we applied awithin-subjects design in which participants had torate two recommendation lists resulting from regular(non-filtered) and filtered requests in TC2 We favoredthis setup over a between-subjects design where sub-jects would have been assigned to only one result listWhile the between-subjects design more closely re-sembles a real-world setting since it does not require

participants to interact with more than one approachat a time differences among subjects (eg regard-ing demographics or topic expertise) can potentiallyhave a considerable impact on quality judgments [19]Hence when applying a between-subjects design onedoes not know whether differences in performance arecaused by the approaches or by the variability amongparticipants Therefore researchers need to conductbetween-subjects experiments with a sufficient numberof study subjects to level out these differences Within-subjects studies on the other hand require fewer par-ticipants while still achieving the same level of sta-tistical power since between-subject variability is notan issue [30 31] Beel et al have shown that evensmall variations in an RS test setting could cause sub-stantial differences in performance outcomes [6] Tokeep the extent of variations even lower comparisonsbetween non-filter and constraint-based recommenda-tions were based on the same profile for each user inTC2 Therefore the user profile from TC1 served as astarting point for filtered retrieval The web applicationshowed the recently generated profiles to users andasked them to provide an additional filter conditionFig 5 depicts an example web form for a constraint-based recommendation request in the music domain Itcomprises the user profile and an additional text fieldwhere users stated their constraint

Afterwards the engine applied this filter conditionon the set of potential recommendation results be-fore similarity calculation started For each usage sce-nario different types of constraints were available Re-garding suitable filter options the author sought toachieve a balance between assisting users in finding fil-ter conditions that would adequately represent their in-formation needs while keeping the variability amongusers as low as possible Therefore the specificitiesof each scenario and the availability of the respectivedata sources were determined For instance for themultimedia domains (movie music and books) it wasassumed that users would like to filter recommenda-tions according to the specific genre of the item Fortu-nately the DBpedia dataset contains this kind of infor-mation for the music domain In this domain the genrecan be retrieved through the properties dbogenreHowever in the movie and book domains DBpediadoes not provide sufficient information Many LOD re-sources which represent books were not assigned agenre In the movie domain a genre property is miss-ing entirely and genre-specific information can of-ten only be found in the subject categories describ-ing the movie Hence in contrast to the experiment on

L Wenige et al Similarity-based Graph Queries 17

Fig 4 Evaluation screen for recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

music recommendations participants in the domainsof books and films did not filter their recommenda-tion lists by genre On the other hand the propertydcsubject was applied as recommendation filter ineach multimedia domain Since the subject propertyoften links to many informative features of multimediaitems (eg release period geographic information orcontent information) the author decided that this fea-ture represents a powerful filter dimensionIn the experiment on travel destination search the webinterface offered two filters a subject and a location-specific filter The subject filter enabled users to statetheir preferences regarding the characteristics of a lo-cation (eg being a nature reserve) The location-specific filter on the other hand gave participants theoption to specify where the desired destination shouldbe located11

The second fundamental research issue of TC2 con-cerned the question whether expressive constraints canboost recommendation quality even further Hence

11httpdbpediaorg

apart from executing a constraint-based workflow witha simple filter (ie a direct attribute of an item)the SKOSRec engine generated suggestions with thehelp of an expressive filter as well This second pro-cedure while still applying the same user constraintas the simple filter utilized an expressive graph pat-tern to include additional LOD resources in the resultset For instance in case a subject filter was selectedresult set expansion was facilitated by exploring thewider semantic space of the DBpedia category graphthrough a property path declaration in the graph pat-tern (eg skosbroader[2]) In the travel exper-iment an additional expressive filter retrieved place-related information (dulhasLocation) for location-specific subproperty relations that are not material-ized in DBpedia Hence the expressive graph pat-tern expanded the filter such that additional proper-ties were matched (eg dbocountry dbocity ordbostate) while still applying the specified userconstraint on the set of LOD resources Upon exe-cution of constraint-based recommendation requestsusers received two recommendation lists (resultingfrom simple and expressive filtering accordingly) in a

18 L Wenige et al Similarity-based Graph Queries

Fig 5 Web form for a contraint-based recommendation request (music domain)

separate evaluation screen Participants evaluated re-sult sets in the same manner as in TC1 For each resultset the screen contained an additional Likert item ask-ing subjects to give their agreement to the statementldquoThe filter has improved the recommendation resultsof listrdquo The sequence of suggestions resulting fromthe different methods was randomly assigned in eachuser session to avoid order effectsAfter study subjects had completed TC2 they werenavigated to test case 3 (TC3) In the TC3 section ofthe travel experiment participants could choose be-tween three options for rollup query patterns (simi-lar to Q6 Sect 3) Users were able to obtain recom-mendations for travel destinations based on POIs theyhad visited during the stay in another destination Theycould either select a city (option 1) a region (option 2)or a country (option 3) as destination type Upon en-tity type selection the SKOSRec engine generated ap-propriate suggestions Fig 6 shows the web form fromthe travel experiment that facilitated the formulation ofadvanced rollup requests in TC3

When users had selected an entity type stated theirfavorite travel destination and three belonging POIs

the application generated a SKOSRec query from theseparameters A simple on-the-fly request was sent tothe engine as well to retrieve baseline recommenda-tions with which the suggestions resulting from theadvanced request were compared at a later stage ofthe experiment The simple query only contained theuserrsquos favorite travel destination and the selected en-tity type option thus omitting information on the POIsDuring the travel experiment the engine processedboth the simple and the advanced query and producedtwo recommendation lists As in the previous parts ofthe experiment participants assessed the quality of theresult sets which appeared in random order on thescreenThe multimedia experiments also contained a sec-tion on rollup retrieval (TC3) As the advanced travelqueries the multimedia requests aggregated similarityscores of sublevel entities through summation whichthe engine joined with a postfilter In addition to thesefeatures the pattern contained a preference query partThis section identified a set of preferred items througha single user statement Participants either specifiedtheir favorite directoractor (music domain) music act

L Wenige et al Similarity-based Graph Queries 19

Fig 6 Travel - User profile generation TC3

(music domain) or author (book domain) and receivedrecommendations based on the features of the workscreated by the artist (similar to Q5 Sect 3) TheSKOSRec engine generated recommendations basedon the creative works of the favored artist Partici-pants also received suggestions from a simple on-the-fly query which computed results according to theartistrsquos characteristics As in the travel experiment rec-ommendations from the advanced rollup request werecompared with the on-the-fly query without letting par-ticipants know which approach they were assessingand through random assignment of result set positionsAfter participants of the multimedia experiments hadevaluated the results of TC3 they were guided to testcase 4 (TC4) which evaluated cross-domain queriesThe experiment on travel RS ended with TC3 becauseno suitable graph patterns could be identified to fa-cilitate these kinds of suggestions for the travel usagescenario In the multimedia domain however the datafrom DBpedia was sufficient for this retrieval task Theentity type of the study (ie movie music act or book)defined the source domain based on which the enginegenerated suggestions for items from another multime-dia target domain (similar to Q7 Sect 3) Fig 7 de-picts the cross-domain web form of the music experi-ment

The web form assisted subjects in formulating a re-quest which then obtained movie suggestions basedon the favorite music act of the user Parameters fromusers were entered in the placeholders of the querypattern in the backend of the application We formu-lated the query templates in such a way that the queryengine matched any properties or graph patterns thatcould potentially connect two items from the specified

target and source domains For instance such match-ings can occur when a user states his favorite moviethat is written by a particular author In case the sameauthor has also written some books they might be ofinterest to the user as well Another example stemsfrom the movie domain Suppose in hisher profile aconsumer has declared a preference for a movie thatlinks to the corresponding soundtrack It might wellbe the case that the user likes the soundtrack and themusic acts that were involved in creating it just asmuch as she likes the movie The practice to distin-guish homonymous LOD resources through the prop-erty dbowikiPageDisambiguates is another in-teresting linking pattern from DBpedia that was ex-ploited for cross-domain requests in TC4 since it con-nects similar entities (ie the book edition of a certainmovie)Cross-domain SPARQL queries (similar to Q8 Sect3) were posed as baseline requests in this test caseWhile the SPARQL query only matched cross-domainrelations for a single LOD resource the SKOSRec re-quest expanded the search space by considering linksfrom more than one item As in the previous test casesTC4 ended with an evaluation screen where users as-sessed the SPARQL-based as well as the SKOSRecsuggestions in random order without knowing whichrecommendation list belonged to which method Table10 gives an overview of the test cases and the meth-ods that were applied in the experiments in each usagescenario

42 Results

In total 257 people took part in the studyThe gath-ered samples from the experiments contained answers

20 L Wenige et al Similarity-based Graph Queries

Fig 7 Music - User profile generation TC4 (page 9)

Table 10Test Cases in the web experiments

Test Case Evaluation Purpose Query Type Travel Movie Music BookTC1 On-the-fly Recommendations (Baseline) Q1 X X X X

TC2 Constraint-based Recommendations Q3 X X X X

TC3 Advanced Queries (Rollup) Q5 Q6 X X X X

TC4 Advanced Queries (Cross-Domain) Q7 x X X X

from both genders with a slight overrepresentation ofmale participants (537) The prevailing number ofstudy subjects was between 20 and 40 years of age(658) The evaluation of the demographics sectionalso revealed that the majority of participants alreadyperceives the search in their domain of interest as ei-ther ldquomanageablerdquo (385) or even as ldquoeasyrdquo or ldquoveryeasyrdquo (440)Prior to analyzing the performance results of the novelretrieval approaches of the SKOSRec engine it waschecked whether the respective Likert items for Per-ceived Usefulness and user-based diversity (divU)reached acceptable reliability levels For this purposeCronbachrsquos α values were determined Table 11 showsthe outcome of this analysis for the Perceived Useful-ness construct Given the acceptable values through-out the domains (Cronbachsrsquo s α gt 07) it could betreated as an interval-scaled variable in the statisticaltests

Table 11Cronbachrsquos α values for the concept of Perceived Usefulness

Domain Cronbach αTravel 07002

Movie 08579

Music 08515

Book 08949

Besides usefulness user-based diversity scores werealso measured by multiple Likert items However

Cronbachrsquos α values for this concept did not reach ac-ceptable levels In order to avoid that valuable user as-sessments remained unused agreements to the state-ment bdquoThe recommendations of list [] are diverse ldquowere taken into account as an ordinal variable (divU)Based on these results subsequent tests could be car-ried out The first part of the analysis concerned theperformance of the on-the-fly retrieval method (base-line) of the SKOSRec engine in TC1 The outcomeof this analysis confirms that the on-the-fly approachrepresents a viable recommendation strategy Table 12shows the mean (M)12 standard deviation (S D) andnumber of participants (N) for each performance di-mension and metric It can be seen that users assessedthe enginersquos performance in almost each quality di-mension to be above mediocrity Aside from the posi-tive user evaluations with regard to Perceived Uselful-ness related quality aspects such as Accuracy Noveltyand Diversity received good assessments as well Forinstance each domain mean relevance score (mrs) washigher than 50 score points It means that participantswere often convinced of the relevance of the resultsGiven these positive user assessments with regard toitem relevance it seems to be justified to consider themean of result list sizes as an indicator of recall Theengine was able to generate the required number ofrecommendations in almost each test case (see Table

12In case of the user-based diversity score (divU) M denotes themedian because of the ordinal scale of the variable

L Wenige et al Similarity-based Graph Queries 21

12) The on-the-fly retrieval approach was also testedwith regard to the Novelty dimension Table 12 showsthe mean novelty scores (nv) for the baseline recom-mendation approach Novelty scores indicate that themajority of items in a result list were deemed to be newin the travel and book RS whereas in the experimentson movie and music RS only one third of the recom-mendation list was unfamiliar to users In contrast tothat content-based Diversity scores (divC) were ratherhigh in each domain Hence it can be concluded thatitems in a result set usually have a low rate of match-ing SKOS annotations In addition to content-basedscores user assessments of recommendation list Di-versity (divU) were also taken into account The me-dian level of agreement to the Diversity-related Lik-ert item stating that recommendations are diverse liedat 3 (ldquoneutralrdquo) in the multimedia domains and at 4(ldquoagreerdquo) in the travel domainIn summary the baseline approach achieved fairlygood performance results in TC1 The findings demon-strate that LOD-enabled recommendations can po-tentially enhance a userrsquos search experience Theyalso allow for subsequent tests of advanced retrievaltechniques which can be compared to this standardmethod

In the analysis of performance results from TC2(constraint-based recommendations) particular atten-tion was paid to participantsrsquo agreements with thestatement ldquoThe filter has improved the recommenda-tion results of the listrdquo Fig 8 shows the distribution ofdomain-wise agreement ratios On average they werefairly high Throughout the domains the vast majorityof participants either selected 5 (ldquostrongly agreerdquo) 4(ldquoagreerdquo) or 3 (ldquoneutralrdquo) Hence a filter often had apositive impact on results

The authors gathered responses to the questionon improvement for recommendation lists resultingfrom both the simple and the expressive filtering ap-proaches Hence they give clues about the generalperformance of constraint-based retrieval Howeversince this section of the study revealed the experimen-tal condition to participants (ie application of a fil-ter) answers to the Likert statement have to be inter-preted cautiously Results may be slightly biased asparticipants were able to guess the underlying agendaHowever these limitations do not exist for compar-isons between simple and expressive filters Since theweb application randomly assigned the list order par-ticipants did not know which approach they were as-sessing This enables us to take a closer look at the dif-ferences between the two filtering approaches It was

Table 12Performance results in TC1 (M = Mean S D = Standard DeviationN = No of participants)

Dimension (Metric)[Max Score]

Domain Results

M SD N

Accuracy (size) [10]

Travel 1000 000 103Movie 1000 000 50Music 1000 000 53Book 980 140 51

Accuracy (mrs) [1]

Travel 5649 000 103Movie 6225 1915 50Music 5570 010 53Book 5747 1555 50

Novelty (nv) [1]

Travel 060 034 101Movie 036 031 50Music 036 039 52Book 061 029 47

Diversity (divU) [5]

Travel 4 - 102Movie 3 - 50Music 3 - 53Book 3 - 50

Diversity (divC) [1]

Travel 080 016 103Movie 087 012 50Music 086 010 50Book 092 008 50

Usefulness [5]

Travel 334 085 103Movie 355 073 50Music 318 085 53Book 343 081 50

hypothesized that advanced filters improve recommen-dation quality as they may identify more relevant re-sources thereby overcoming problems of data qualityor data sparsity in LOD repositories Table 13 lists re-sponse rates of the two methods The rate measuresthe ratio of non-zero result sets among all result setsThroughout the domains recommendation lists result-ing from expressive filtering achieved higher responserates

Table 13Response Rates (TC2)

Domain Constr-based(Regular)

Constr-based(Expanded)

N

Travel 23 57 103

Movie 86 88 50

Music 72 75 53

Book 73 76 51

22 L Wenige et al Similarity-based Graph Queries

000

025

050

075

100

Digital Library Travel Movie Music Book

Domain

Per

cent

age

of A

gree

men

t

Degree of Agreement

1

2

3

4

5

The filter has improved the recommendations of the list

Fig 8 Participantsrsquo agreement with the statement that the filter has improved the result list (domain-wise) (TC2)

In addition to response rates user assessments andscores in the defined quality dimensions were com-pared Table 14 depicts the outcome of this analysisSignificantly higher scores are marked in bold figuresIn case statistical tests identified no meaningful varia-tions the results of the better performing approach areunderlined A-levels were adjusted with the false dis-covery rate (FDR) in order to decrease the probabilityof making a type I error due to multiple testing

According to mean scores expressive requestsgenerated more comprehensive recommendation lists(size) This finding is in line with the increased re-sponse rates shown in Table 13 However subse-quently conducted t-tests confirmed significant dif-ferences only for the travel experiment (t(102) =minus789 p lt 0001) In contrast precision scores werehigher in most of the domains (travel movie mu-sic) when the SKOSRec engine processed a simpleconstraint-based query However statistical tests didnot confirm any systematic differences The authorsapplied Wilcoxon tests for mrs scores and the remain-ing performance metrics because of the small samplesizes resulting from the low response rates of the sim-ple constraint-based approachRegarding Novelty none of the approaches was supe-rior In the Diversity dimension expressive filters gen-erated suggestions that were at least as topically di-versified as the results from simple constraint-basedretrieval Even though no significant differences were

identified between the two approaches increased meanscores (divU in the travel domain and divC in each do-main) indicate a slight superiority of expanded filterrequests in this quality dimension The same appliesto usefulness scores (mean values were higher in 3 outof 4 domains when expressive filtering was applied)However these statements are speculative at best be-cause statistical tests were not significantIn summary it can be concluded that expressive fil-ters improve recall potentially leading to a slight lossin precision scores (see Table 14) It may also be thecase that expanded user constraints diversify as wellas increase the usefulness of recommendation listsHowever these claims are unverified In contrast itis safe to say that expressive query patterns generateresults of at least the same quality as simple patternswhile increasing the number of relevant suggestionsThe few differences may be explained by the high Jac-card scores (JI) (see Table 15) of recommendation listsThroughout the domains result sets were much alikeGiven these similarities and the increases in list sizesfor expressive filters it is supposed that an expandedconstraint produces an enhanced version of the recom-mendation list resulting from simple filtering There-fore expanded filters should be the default setting in aconstraint-based retrieval context

In TC3 (ie evaluation of roll-up query patterns)the SKOSRec engine almost always generated non-empty recommendation lists (see Table 16) In the

L Wenige et al Similarity-based Graph Queries 23

Table 14Performance results in TC2 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Simple Expressive N

M SD M SD

Accuracy (size) [10]

Travel 117 270 452 462 103Movie 672 408 720 392 50Music 596 464 636 451 53Book 459 450 500 451 51

Accuracy (mrs) [1]

Travel 6673 2051 6534 2236 23Movie 6095 2298 6029 2279 43Music 6377 1605 6296 1644 34Book 5617 1818 5681 1681 37

Novelty (nv) [1]

Travel 048 041 061 049 21Movie 044 036 055 080 42Music 055 039 052 037 38Book 077 034 076 033 33

Diversity (divU) [5]

Travel 3 - 35 - 24Movie 4 - 4 - 43Music 4 - 4 - 38Book 3 - 3 - 37

Diversity (divC) [1]

Travel 068 018 072 021 19Movie 070 024 073 023 41Music 086 013 087 012 34Book 084 024 095 063 29

Usefulness [5]

Travel 333 093 350 077 24Movie 332 085 331 086 43Music 355 081 364 078 38Book 339 083 362 121 37

Table 15Jaccard Indices (TC2)

Domain Mean Jaccard Index (JI) SD N

Travel 057 047 25

Movie 071 038 43

Music 082 035 37

Book 092 026 35

travel and the book experiment the response rate washigher for aggregation-based queries whereas in themovie domain the engine more often provided recom-mendations for regular requests However these dif-ferences are only marginal and do not indicate a clearsuperiority of one method

Fig 9 depicts participants general agreement to Lik-ert statements concerning the Perceived Usefulness ofthe approaches in TC3 The diagram shows similarresults for regular as well as advanced requests Onaverage satisfaction scores reached levels which layslightly above neutral agreement A subsequent t-test

Table 16Response Rates (TC3)

Domain Regular Aggr-based N

Travel 99 100 103

Movie 96 92 50

Music 100 100 53

Book 96 100 51

confirmed no significant differences between the twomethods In fact the approach with the highest aver-age score of Perceived Usefulness was different in eachexperiment While mean values were higher for regu-lar requests in the studies on travel and music RS ad-vanced queries produced higher scores in the movieand book experiments

Content-based Diversity (divC) was the only met-ric with systematic differences Here scores went upfor advanced retrieval patterns in the travel (t(98) =minus550 p lt 0001) the music (t(52) = minus451 p lt

0001) and the book domain (t(47) = minus490 p lt

24 L Wenige et al Similarity-based Graph Queries

0

1

2

3

Travel Movie Music Book

Domain

Agr

eem

ent

Method

Regular

Rollminusup

Perceived Usefulness (Regular vs Rollminusup Queries)

Fig 9 Participantrsquos overall satisfaction with regular and roll-up recommendation retrieval (TC3)

0001) (see Table 17) This finding is not surprisinggiven that the aggregation-based approach explores thewider semantic network of each item in the profile

Hence it can be assumed that both approaches pro-duce recommendation lists of similar quality and canprovide benefits to potential consumers Thus it can-not be clearly stated which approach is superior be-cause neither of them outperformed the other in eachdomain While the regular retrieval strategy was bet-ter in terms of usefulness in the travel and the mu-sic domains the aggregation-based approach achievedhigher scores in the domains of movie and book rec-ommendationsThis conclusion is especially impressive given thelow mean Jaccard scores for recommendation lists re-sulting from regular and aggregation-based retrieval(see Table 18) The mean score never exceeded 01

throughout the domains It indicates a low concor-dance between result sets It is remarkable that par-ticipants perceived the recommendations as equallygood given the high dissimilarity of the lists Thusadvanced queries have an added-value as they provideadditional interesting results They can help to exploreLOD repositories in different ways than a regular re-trieval approach Therefore it is worthwhile to offeraggregation-based query patterns as an alternative re-trieval strategy when the regular approach has not pro-duced any helpful recommendations

In addition to TC3 user assessments for TC4 (cross-domain recommendations) were also evaluated How-ever since the authors conducted experiments for TC4only in the multimedia domains the samples were con-siderably smaller Nevertheless it is assumed that theamount of completed user sessions is only just enoughto conduct further statistical analyses Table 19 showsthe response rates for the two retrieval methods Cross-domain patterns had a higher success rate of generat-ing non-zero result sets throughout the domains Par-ticipants almost always received a recommendation forthis query type whereas in case a regular SPARQLquery was executed often they did not receive a singlesuggestion at all

We applied non-parametric tests for the perfor-mance metrics mrs nv divU divC and PerceivedUsefulness due to the low response rates of theSPARQL-based approach which led to a low num-ber of comparable data points accordingly Overall theFDR-adjusted pairwise Wilcoxon tests detected no sig-nificant differences except for user-based Diversityscores (divU) in the movie (Z = minus269 p lt 005) andthe book domain (Z = minus256 p lt 005) The resultsfor the content-based Diversity (divC) metric seem topoint in the same direction because of the high meanvalues for cross-domain queries in each domain Butthe significance of these results was not verified Themrs scores indicate a better performance of regular

L Wenige et al Similarity-based Graph Queries 25

Table 17Performance results in TC3 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular Aggr-based N

M SD M SD

Accuracy (size) [10]

Travel 990 099 942 207 103Movie 288 059 276 082 50Music 1000 000 1000 000 53Book 288 059 300 000 51

Accuracy (mrs) [1]

Travel 4821 2498 4493 2559 102Movie 5458 2288 5669 2542 44Music 5395 1831 5824 1619 52Book 5153 2390 4925 2230 49

Novelty (nv) [1]

Travel 043 039 038 040 98Movie 035 038 028 041 41Music 049 036 046 039 50Book 061 042 062 044 47

Diversity (divU) [5]

Travel 35 - 3 - 100Movie 4 - 3 - 43Music 4 - 3 - 50Book 3 - 3 - 48

Diversity (divC) [1]

Travel 079 017 089 013 99Movie 075 020 066 027 44Music 083 017 093 007 53Book 074 027 093 006 48

Usefulness [5]

Travel 336 077 322 074 101Movie 320 092 326 094 44Music 345 084 344 085 51Book 295 087 303 096 48

Table 18Jaccard Indices (TC3)

Domain Mean Jaccard Index (JI) SD N

Travel 007 011 101

Movie 003 007 44

Music 002 005 53

Book 002 006 50

Table 19Response rates (TC4)

Domain Regular (SPARQL) SKOSRec N

Movie 32 96 50

Music 62 98 53

Book 53 100 51

SPARQL queries but the within-subjects test did notconfirm this assumption statistically (see Table 20)

However the most important finding of the anal-ysis was that SKOSRec cross-domain queries gener-ated significantly more suggestions throughout the do-

mains (movie t(49) = minus1792 p lt 0001 mu-sic t(52) = minus2573 p lt 0001 book t(50) =minus1275 p lt 0001) (see Table 20) Fig 10 illustratesthis graphically

Whenever the SPARQL request provided recom-mendations which was the case in less than half ofthe user sessions (see Table 19) the SKOSRec sug-gestions were assessed to be of almost equal qualityas the SPARQL recommendations (see Table 20 Use-fulness) Hence when SPARQL querying did not pro-duce any results in more than the other half of the testcases the SKOSRec approach may still have provideduseful suggestions

Therefore it is reasonable to assume that the capa-bility of the SKOSRec engine to flexibly switch be-tween processing steps of similarity calculation andpattern matching can facilitate an improved explo-ration of LOD repositories Table 21 also shows thatthe items in the two sets often did not match as isindicated by low Jaccard indices This finding furtherdemonstrates that the execution of an additional step

26 L Wenige et al Similarity-based Graph Queries

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

00

25

50

75

100

Movie Music Book

Domain

No

of i

tem

s

Method

Regular

CrossminusDomain

Recommendation List Size

Fig 10 Mean recommendation list sizes for SPARQL and SKOSRec cross-domain requests (TC4)

L Wenige et al Similarity-based Graph Queries 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

of similarity calculation can help to retrieve other rel-evant items

5 Future Work amp Conclusion

This paper has presented the query-based SKOS-Rec engine that facilitates new types of recommen-dation requests for LOD repositories The evaluationshave shown that the SKOSRec system can often gener-ate relevant and useful suggestions when certain querytemplates are utilized In many cases the applicationof expressive or advanced query patterns (ie in thecontext of constraint-based retrieval or cross-domainrequests) helped to significantly improve recall valueswhile still providing the same level of quality in theremaining performance dimensions Additionally ad-vanced recommendation requests can be used to gener-ate diversified recommendation lists in case users arenot satisfied with the results of regular recommenda-tion requests (ie roll-up or cross-domain retrieval vsregular requests) While not being superior in each do-main and test case is at least safe to say that the novelrecommendation approaches of the SKOSRec engineconsiderably extend common retrieval methods and atleast provide an alternative search strategy when con-ventional methods fail to provide useful results Addi-tionally the increase in diversity and recall is in linewith findings from previous research on LOD-enabledRS [45]It is a strength of the developed approaches that theyfacilitate combinations of graph-based and similarity-based retrieval at different stages of the recommenda-tion workflow This feature extends general search ca-pabilities for semantic networks It requires future re-search to investigate whether in addition to SKOS fur-ther RDF vocabularies can be used for ad-hoc item-to-item similarity computation to increase the range ofpotential usage scenarios since not every LOD datasetcontains SKOS annotations [53] The application ofadditional properties could also boost the recommen-dation quality of some query patterns (eg such as thequery types used in TC3) In this context it also needsto be determined whether the representation of the user

profile in the SKOSRec query language can be substi-tuted with free-text expressions to enable search-likefunctionalities LOD researchers have already devel-oped engines that can process natural language queriesover RDF data [59] [34] [67] It will have to be inves-tigated how the SKOSRec engine profitably fits intothis landscape of existing retrieval toolsAnother open research question concerns the aspect ofinterfaces to other applications In its standard config-uration the system is not an end-user retrieval enginebut a backend application with which an administra-tor who has domain knowledge (eg of RDF vocab-ularies and data models of a particular LOD reposi-tory and usage scenario) creates the respective querypatterns Although the websites of the online experi-ments represent possible implementations of suitableend-user applications for the SKOSRec engine thereare still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selec-tion of LOD items for the profile was made possibleby previously storing the data in an index that couldthen be accessed and searched before issuing a recom-mendation query However this runs against the ideaof ad-hoc retrieval In this context future research willhave to determine whether it is feasible to generate anon-the-fly mapping from items to the respective LODresources In the case of a commercial application theengine needs to match items with the correspondingentries in the product catalog Even if it is not yet clearwhich UI components can make the best possible useof the SKOSRec enginersquos features the results from theuser experiments can give at least a few suggestionsThe following retrieval options should be the defaultsetting in an interface that contains a LOD-enabledRS component to assist users in finding interestingitems Since the regular approach of on-the-fly re-trieval achieved good results throughout the domainsthe display of simple recommendations generated frompreviously stated user preferences represents a suitablestarting point for retrieval On-the-fly queries could berefined by providing filter options Since the evalua-tions have shown that expressive constraints often leadto increased recall values the application of such afilter might be the best option for assisted retrievalAs in the preparations for the experiments appro-priate graph-based query patterns for the usage sce-nario in question would also have to be determinedby a domain expert before setting up the applicationThe same applies to advanced retrieval patterns whichshould be available to the end user to refine the querywhen both the simple as well as the constraint-based

28 L Wenige et al Similarity-based Graph Queries

approach have failed to provide relevant items Thissuggestion is made because roll-up queries were com-petitive with regular recommendations in the web-based experiments Therefore they represent a viablealternative retrieval strategy For some domains (egfor multimedia retrieval) it is also possible to usean optional selection field to enable the formulationof cross-domain requests Evaluations for this querytype suggest that users might receive interesting results(ie diversified and comprehensive recommendationlists) from such a requestIn summary the contributions of the paper are as fol-lows

ndash Development and implementation of novel re-trieval strategies in the SKOSRec system whichfacilitate personalized requests and leverage thepotential of knowledge graphs in order to addressthe research gap in state-of-the-art recommenda-tion methods with regard to customized queries

ndash Realization of web-based user experiments infour usage scenarios with the following out-comes

bull Successful demonstration that knowledge graphdata is applicable in the context of live recom-mendation scenarios

bull Creation of suitable query patterns that canbe applied to represent individual informationneeds in different application contexts The setof query patterns can be extended with the helpof the SKOSRec query language to meet therequirements of additional usage scenarios

bull Proof that the novel retrieval strategies can besuccessfully applied across various applicationcontexts and recommendation tasks and thatthey outperform conventional content-basedretrieval approaches especially in the perfor-mance dimensions of diversity and recall

The prototypical implementation and the evaluationof the SKOSRec engine in different usage scenarioshave demonstrated that the freely available knowledgesources on the LOD cloud can be successfully appliedto advance state-of-the-art methods in IR and recom-mender systems The proposed personalized and cus-tomized retrieval strategies can leverage the potentialof knowledge graphs by better matching informationneeds of users with the help of semantic search op-tions

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be com-prised of up to six elements of which the partsSimProjection and ItemPart are obligatory Henceusers need to state at least their preferencesand how many recommendations they wouldlike to receive Advanced retrieval patterns canbe formulated as well For instance similar re-source retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (Se-lectPart) of LOD resources As in SPARQL thequery can start with a prologue that abbreviatesIRIs with namespace declarations [21]

SelectPart The SelectPart part enables subqueryingwith recommendation results The surroundingquery is a SELECT query with a slightly simpli-fied WHERE clause (ie RecWhereClause) withwhich postfilter conditions can be formulatedIn this section users have the option to formu-late aggregation-based retrieval requests (Aggre-gation)

Aggregation This query part handles aggregation-based postfiltering Users specify the IRI (IRIref )resource for which similarity scores of sublevelentities are summarized Additionally it is statedhow the data should be aggregated (ie SUMMAX or AVG) as well as which variable (Var)represents the aggregation-based recommenda-tions in the postfilter section The compiler makessure that this variable is actually contained in theRecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refersto the list of generated suggestions It definesthe recommendation variable to which all simi-lar LOD resources are mapped In case pre- andpostfilter conditions are set in the other sectionsthe compiler checks whether the variable (Var)appears in these sections as well Otherwise therequired join operations cannot be executed Itis specified how many recommendations shouldbe displayed (Integer) and whether the requestwill be issued against a default repository or as across-repository request (ServiceIntegration)

ServiceIntegration This section indicates that a userintends to receive recommendations from a dif-ferent SPARQL endpoint than the default end-point that is stated in the standard configurationof the SKOSRec engine It specifies the targetSPARQL endpoint (IRIref ) from which a user

L Wenige et al Similarity-based Graph Queries 29

wants to receive recommendations Upon extrac-tion of SKOS annotations from the default sourceendpoint a subsequent request is sent to the spec-ified target endpoint

ItemPart This section of a SKOSRec query repre-sents a single preference in the user profile How-ever recommendation queries can contain a cou-ple of preference statements A preference is ei-ther expressed as a LOD resource (IRIref ) or asa variable which is bound to a preference query(VarPart) In the latter case the resources areobtained by matching the graph pattern in theVarPart with the triple statements in the reposi-tory The ItemPart statement can be additionallysupplemented with a concept-to-concept similar-ity threshold (Sim) that triggers concept expan-sion with the approach of flexible similarity de-tection which is extensively described in one ofour previous works [64]

VarPart The construct initiates preference queryingIndividual likings are expressed as a graph pat-tern in a RecWhereClause The variable (Var) isa placeholder for the LOD resources that will beused to set up the user profile The compiler hasto make sure that the specified variable occurs inthe RecWhereClause as well Otherwise the ex-traction of LOD resources cannot be carried outcorrectly

Sim The Sim part of a SKOSRec query denotes thethreshold of concept-to-concept scores for flexi-ble similarity detection Even though knowledge-based similarity values usually range between 0and 1 the specification allows a broader range ofdigits in the Decimal datatype to enable applica-tion of other similarity metrics in case they areneeded

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users canstate whether scores should be ldquolargerrdquo ldquolargerthanrdquo or ldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three dif-ferent sections of a SKOSRec query ie either inthe prefilter (ItemPart) the postfilter (SelectPart)or the preference filter section (VarPart) TheRecWhereClause closely resembles the ldquoWhere-Clauserdquo of the SPARQL 11 specification withall its subsequent parts [21] Hence it enablesdifferent combinations of graph pattern match-ing However the RecWhereClause of the SKOS-Rec language allows fewer pattern matching ex-pressions than the ldquoWhereClauserdquo of the regular

SPARQL syntax specification [21] For instanceit is neither permitted to formulate subqueries norto direct filter requests to more than one reposi-tory (see RecGraphPatternNotTriples for furtherexplanations) This modification has been madeto simplify similarity calculation After parsingthis section the compiler makes sure that recom-mendation preference or postfilter variables oc-cur at least once in the RecWhereClause to guar-antee that subsequent steps of the recommenda-tion workflow can be applied on existing LOD re-sources In case the RecWhereClause comprises aRecMinusGraphPattern it also has to be checkedthat the variable is not only contained in theMINUS part of the group

RecGroupGraphPattern A RecGroupGraphPatterncan contain one to many basic graph patternswhich can be differently combined according toRecGroupGraphPatternSub

RecGroupGraphPatternSub This section defines thegraph patterns that are applied to identify suit-able LOD resources User filters can be either ex-pressed as basic graph patterns (TriplesBlock) oras combinations of them (RecGraphPatternNot-Triples) The sequence of different kinds of pat-terns can be flexibly specified

RecGraphPatternNotTriples This query part closelyresembles the similar named ldquoGraphPatternNot-Triplesrdquo of the SPARQL 11 specification [21]The only difference to SPARQL is that the Rec-GraphPatternNotTriples section is a little morerestricted in terms of admissible graph patternsthan the ldquoGraphPatternNotTriplesrdquo section of theregular SPARQL specification For instance itdoes not allow to retrieve LOD resources otherthan from the default graph that is specified in theconfiguration of the SKOSRec engine since filterpatterns have to be applied to datasets that containSKOS annotations Hence they need to be speci-fied before runtime execution to ensure that sim-ilarity calculation can be carried out correctly Ifa user was able to formulate filter conditions thatreferred to different RDF graphs and endpointsaccordingly this would not be possible

RecGroupOrUnionGraphPattern This query sec-tion marks an alternative graph pattern in the styleof the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntaxspecification [21] However it neither allowsSPARQL-like subqueries nor federated queries

30 L Wenige et al Similarity-based Graph Queries

for filtered resources because this would compli-cate similarity calculation

RecOptionalGraphPattern In this part of a SKOS-RecQuery users are enabled to specify additionalgraph patterns which might extend the query so-lution but do not necessarily have to match thedata

RecMinusGraphPattern This section handles exclu-sion of LOD resources for cases when users wantto omit certain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

L Wenige et al Similarity-based Graph Queries 31

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM A

mobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige J Ruhland Flexible on-the-fly recommendationsfrom Linked Open Data repositories In International Confer-ence on Business Information Systems pp 43ndash54 2016

[63] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[64] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[65] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[66] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[67] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Future Work amp Conclusion
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 9: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge … · Semantic Web 0 (0) 1 1 IOS Press Similarity-based Knowledge Graph Queries for Recommendation Retrieval Lisa Wenige*,

L Wenige et al Similarity-based Graph Queries 9

sources with the mappings Ωr of all relevant resourcesand annotations as determined from the user profile

Ωpre = Ωpre f ilter Ωr (11)

After retrieving annotations and relevant resourcesin prefiltering mode the system generates constraint-based recommendations In this context the rankingprocedure is adapted to the restricted set of relevantresources and annotations Hence IC values of match-ing annotations and respective SKOS similarities arequantified according to the user filter By this meansrecommendation scores reflect the similarity of itemsfor the actual set of filtered LOD resources Except forthis restriction the system carries out the calculations(ie determination of similarity values the ranking ofLOD resources) in the same way as in non-filteringmode

35 Postfiltering amp Combinations

The SKOSRec engine cannot only combine simi-larity calculation and graph pattern matching to pre-filter relevant LOD resources it can also apply userconstraints after similar items have already been de-termined By this means recommendations are filteredex-post to the similarity detection process This fea-ture enables novel retrieval patterns in which sugges-tions are part of a subquery As an example scenariofor a postfilter recommendation task consider the fol-lowing SKOSRec query which obtains suggestions formovies and directors based on a userrsquos preference forthe director dbrQuentin_Tarantino (see Listing6) For the given request the engine would identifysimilar directors based on Quentin Tarantinorsquos char-acteristic features (eg geographic origin or awardswon) and would determine the movies these similar di-rectors have shot Thus users may receive interestingmovie recommendations

Table 5 shows the solution the SKOSRec system re-turns when Q4 is issued against DBpedia The ellipsesin the displayed tables indicate records that have beenexcluded for brevity reasons

However this approach can still be improved uponFor instance the engine only considers the metadatadescriptions of the director whereas related movieshave to be retrieved anew upon execution of the post-filter section Thus they are not ranked according tosimilarity in the final output tableAnother weakness is the biased semantics of the query

Listing 6 SKOSRec query to generate simple post-filtered recommendations in the movie domain (Q4)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX yago lthttpdbpediaorgclassyagogt

SELECT director movieWHERE movie dbodirector director director rdftype yagoDirector110014939 LIMIT 50RECOMMEND director TOP 5PREF dbrQuentin_Tarantino

Table 5Result set of Q4

director moviedbrKirk_Douglas dbrScalawag_(film)dbrKirk_Douglas dbrPosse_(1975_film)dbrKevin_Costner dbrThe_Postman_(film)dbrKevin_Costner dbrDances_with_Wolves dbrGene_Kelly dbrHello_Dolly_(film)dbrGene_Kelly dbrInvitation_to_the_Dance_(film) dbrAlan_Alda dbrGoodbye_Farewell_and_AmendbrAlan_Alda dbrMargaretrsquo_s_Engagement dbrSteve_Buscemi dbrInterview_(2007_film)dbrSteve_Buscemi dbrAnimal_Factory

Usually when a person favors a movie director thispreference refers to the movies this director has shotand not to the personal features of the person Two di-rectors might be similar according to certain featuressuch as birth year nationality or received awards butthe style and genre of the movies they have created canstill be different A third factor is that recommenda-tion retrieval for movie directors should also take intoaccount that a film artist is likely more relevant themore movies she has shot that are in line with the pre-ferred movies of the user Hence the SKOSRec engineenables another postfiltering strategy which is calledaggregation-based retrieval This strategy can be help-ful whenever an entity type is linked to a couple ofLOD resources in a dataset such that similarity scoresof related entities (eg movies) can be aggregated toan overall recommendation score for a certain item(eg for a movie director)Aggregation-based requests can be well combinedwith other retrieval patterns such as preference queries

10 L Wenige et al Similarity-based Graph Queries

In the case of the movie recommendation scenario auser could state that she enjoyed Quentin Tarantinomovies in the past and would like to receive sugges-tions for directors that shot similar film Usually thiskind of question can typically only be answered byanother human being eg in a personal interactionamong friends or in an online forum The fact that thelanguage facilitates these kinds of queries is one of itskey advantages

Listing 7 SKOSRec query to generate post-filteredrecommendations based on a preference query in themovie domain (Q5)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX yago lthttpdbpediaorgclassyagogt

SELECT director movieWHERE movie dbodirector director director rdftype yagoDirector110014939 LIMIT 50AGG dbrQuentin_Tarantino director SUMRECOMMEND movie TOP 100PREF [ prefMovieWHERE prefMovie dbodirector dbrQuentin_Tarantino ]

Table 6 depicts the results of Q5 as retrieved fromDBpedia When comparing the solutions sets for Q4and Q5 it becomes clear that they are totally dif-ferent While Q5 generated a list of directors whohave mostly created action and thriller movies with atwist (ie Tarantinorsquos unique filmmaking style5) Q4provided a completely different output It shows USAmerican actor-directors whose movies range frommusical films (dbrHello_Dolly_(film)) towestern movies (dbrDances_with_Wolves) ordrama films (dbrInterview_(2007_film))Thus the directors in the solution table of Q4 (Table5) seem rather arbitrary and not exceptionally helpfulconcerning the initial user request

We define this kind of aggregation-based request asa so-called rollup query This definition was chosen be-cause the final suggestions are based on summarized(ie rolled up) preferences for sublevel entities An-other example for a roll-up request stems from the ap-

5httpsenwikipediaorgwikiQuentin_Tarantino

Table 6Result set of Q5

director moviedbrSteven_Spielberg dbrIndiana_JonesdbrSteven_Spielberg dbrJurrasic_Park_(film) dbrFrancis_Ford_Coppola dbrThe_Godfather_Part_IIdbrFrancis_Ford_Coppola dbrThe_Godfather_Part_III dbrChristopher_Nolan dbrThe_Prestige_(film)dbrChristopher_Nolan dbrThe_Dark_Knight dbrRobert_Rodriguez dbrPlanet_TerrordbrRobert_Rodriguez dbrMachete_(film) dbrKevin_Smith dbrClerksdbrKevin_Smith dbrClerks_II

plication scenario of travel search This recommenda-tion query derives suggestions for city trip destinationsfrom the points of interest (POI) a user has visited andliked in another city (eg London) Listing 8 showsthe corresponding SKOSRec query for this scenario

Listing 8 SKOSRec query to generate post-filteredrecommendations (aggregation-based) for a preferenceprofile containing the city of London (Q6)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX dullthttpwwwontologydesignpatternsorggtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX rdfslthttpwwww3org200001rdf-schemagt

SELECT cityWHERE poi property city city rdftype dboCity property rdfssubPropertyOf dulhasLocation LIMIT 3AGG dbrLondon city SUMRECOMMEND poi TOP 100PREF dbrOxford_StreetdbrNotting_HilldbrSouth_Bank

The SKOSRec request contains subquery com-mands that refer to the userrsquos preferred POIs It spec-ifies how similarity scores of the generated sugges-tions should be aggregated (ie sum-based) The re-trieval is based on the 100 POIs that are most simi-lar to the ones in the preference section Additionallythe subquery part refers to the DBpedia resource thatrepresents London (AGG dbrLondon city) thusindicating that the list of recommendations should notcontain any POIs in this city Upon processing the de-

L Wenige et al Similarity-based Graph Queries 11

picted query over the DBpedia repository the enginedisplays the three cities for whom the highest aggre-gated scores have been detected (see Listing 7)

Table 7Result set of Q6

citydbrOxforddbrAbingdon-on-ThamesdbrWallingford_Oxford

The motivation for formulating such a query is thatwhen searching for cities preferences might be betterrepresented through the sights a user has visited else-where rather than through a simple on-the-fly requestsolely containing a preference statement for a city inthe user profile sectionBesides the above presented roll-up retrieval patternsthe SKOSRec grammar may facilitate the formula-tion of additional types of advanced recommendationqueries For instance cross-domain queries are an in-teresting additional pattern to be explored for personal-ized retrieval Listing 9 shows an example request thatgenerates movie suggestions based on the preferencefor a music band (ie dbrThe_Jackson_5)

Listing 9 SKOSRec query to generate cross-domainrecommendations (Q7)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgt

SELECT movie participated musicalActWHERE movie rdftype dboFilm VALUES dbobasedOn dbomusicComposerdbogenre dbostarring musicalAct participated movie UNION movie participated musicalAct LIMIT 10AGG dbrThe_Jackson_5 movie MAXRECOMMEND musicalAct TOP 100PREF dbrThe_Jackson_5WHERE VALUES actType dboMusicalArtist dboBand musicalAct rdftype actType

The query contains a prefilter condition that trig-gers retrieval of music artists and bands Afterwardsthe scores of the top 100 suggestions are summarizedwith maximum-based aggregation Thus the highestsimilarity value among the set of music acts deter-

Table 8Result set of Q7

movie musicalActdbrMoonwalker dbrMichael_Jackson dbrThe_Wiz_(film) dbrAshford_amp_SimpsondbrGarfield_Gets_a_Life dbrThe_TemptationsdbrPipe_Dreams_(1976_films) dbrGladys_Knight_amp_the_PipsdbrThe_Jacksons dbrJermanine_Jackson

mines the ranking score of the film Also note be-cause of the aggregation-based retrieval procedureconnections between movies and the LOD resourcedbrThe_Jackson_5 are excluded from similaritycalculation The scenario exemplifies how semanticrelationships can help to generate suggestions for a tar-get domain (ie movies) that are based on a profilecontaining items from a different source domain (iemusic acts) While regular SPARQL queries performexact matching of triple statements the SKOSRec syn-tax facilitates the integration of similarity- and graph-based retrieval to enrich result lists with other poten-tially relevant itemsAs a point of reference consider the SPARQL queryfor the given cross-domain scenario (Listing 10)

Listing 10 SPARQL query to generate cross-domainrecommendations (Q8)

PREFIX dbr lthttpdbpediaorgresourcegt PREFIX dbo lthttpdbpediaorgontologygt PREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgt

SELECT movie musicalActWHERE movie rdftype dboFilm VALUES dbobasedOn dbomusicComposerdbogenre dbostarring dbrThe_Jackson_5 participated movie UNION movie participated dbrThe_Jackson_5 LIMIT 5

The query omits the similarity calculation partThus it only retrieves movies linked to the resourcedbrThe_Jackson_5 By this means the SPARQLresults have a better chance of being related to the ini-tial user profile On the other hand empty result listscan occur when the preferred LOD resource does nothave any direct connections to recommendable itemsThis assumption is backed up by the result lists of thetwo queries (Tables 8 and 9) The SKOSRec query pro-duces a more comprehensive solution than a regularSPARQL query

12 L Wenige et al Similarity-based Graph Queries

Table 9Result set of Q8

movie musicalActdbrThe_Jacksons dbrThe_Jackson_5

The following of definitions will formalize the post-filtered and aggregation-based retrieval forms Defini-tion 12 specifies the notion of SKOSRec recommenda-tions that may potentially undergo postprocessing

Definition 12 (SKOSRec recommendations) SKOS-Rec recommendations (R(Pr)) are the set of sugges-tions generated by processing the solution mappings(Ω) obtained from recommendation retrieval Theycan be a result of simple on-the-fly retrieval prefilter-ing or a result of a combination of these techniques Prrepresents the input profile that contains at least onepreference for an item The cardinality of R(Pr) is theintended number of recommendations (k) that is spec-ified by the user based on which the engine selects theresources with the highest recommendation scores thatare not contained in the profile (Eq 12)

R(Pr) = x | x isin q | score(Pr q) gt kmaxqisinΩ

score(Pr q) q isin Pr

(12)

The approach of subquerying with recommendationresults builds on the idea that SKOSRec suggestionsR(Pr) can be joined with any graph pattern Ppost pastto the process of similarity calculation (Definition 13)It does not make a difference how the engine obtainedthese recommendations

Definition 13 (Postfiltered recommendation mapping)The postfiltered recommendation mapping is obtainedby joining solution mappings generated from SKOS-Rec recommendations with the results of matching apostfilter graph pattern (Ωpost) (Eqs 13 and 14)

Ωpost f ilter = micro | micro isin [[Ppost]]D x isin dom(micro)

(13)

Ωpost = micro|x | x isin R(Pr) Ωpost f ilter (14)

As the prefilter Ppost can be comprised of triplestatements that are eligible in the RecGroupGraph-

Pattern part of the SKOSRec grammar Aggregation-based queries have to be treated differently than a reg-ular postfilter request Apart from excluding sublevelitems of the profile from the set of similar resourcesthe engine should also omit any connections betweenthe preferred superordinate item (a) and similar sub-level entities Additionally the user has to specify thevariable in his profile (eg y) that represents the itemswithin the postfilter graph pattern Ppost By this meansaggregation-based recommendations can be generated(Definition 14)

Definition 14 (Resources for aggregation-based re-trieval) The set of resources for aggregation-based re-trieval is obtained by retrieving LOD resources thathave a connection to a previously generated sugges-tion thereby excluding all resources that are linked tothe specified superordinate item a in the profile (Eq15)

Ragg(a) = micro(y) | micro isin Ωpost y isin dom(micro) a isin micro(y) (15)

For each potential aggregation-based suggestion (y)similarity scores of connected entities (x) have to beconsidered for the final ranking In this context differ-ent approaches to aggregation can be applied In caseswhen both similar items and aggregation-based recom-mendations are entities on comparable levels of gran-ularity then choosing the maximum similarity scoremight be the best way to represent the relatedness ofthe resource to the user profile On the other handwhen postfiltered recommendations are based on sim-ilarity scores of their sublevel entities (eg as in themovie recommendation example) the scores of relatedsuggestions should be aggregated by the sum or theaverage of individual scores

Definition 15 (Aggregation-based ranking score) Theranking score of an LOD resource y that is connectedto a set of recommended items is determined by aggre-gating scores of connected entities either by the max-imum (Eq 16) the sum (Eq 17) or the average (Eq18) of individual scores

scoreaggMAX(y) = maxxisinmicro(xy)

score(Pr x) (16)

scoreaggS UM(y) =sum

xisinmicro(xy)

score(Pr x) (17)

L Wenige et al Similarity-based Graph Queries 13

scoreaggAVG(y) =

sumxisinmicro(xy) score(Pr x)

|x isin micro(x y)|(18)

4 Evaluation of the SKOS Recommender

41 Experimental Setup

The novel recommendation strategies of the SKOS-Rec engine were evaluated in the context of web-basedexperiments for the usage scenarios of travel destina-tion search and multimedia RS The online approachhelped to reach out to more participants with diversedemographic backgrounds which would not have beenpossible in a laboratory experiment Another positiveside-effect of the web-based setting was that partici-pants could test and rate recommendations without be-ing directly observed by an experimenter which mighthave caused biased results otherwise Hence the on-line approach softened common limitations of labora-tory studies [19]The experiments were carried out in the defined usagescenarios with metadata descriptions from DBpediaThe database containing the local mirror of DBpediaran on a virtual server Besides the triple store a webapplication was hosted on the same machine The web-site ran on an Apache Tomcat 7 application server andwas implemented with the help of Java servlet technol-ogy67

We conducted the online experiments between April2016 and January 2017 Upon setting up the web ap-plication a pretest was carried out to control for po-tential pitfalls such as misleading navigation Two testusers ran step by step through the web interface andprovided their feedback Because of their insights weslightly modified the interface (eg through rephras-ing user instructions) to increase the chances of suc-cessful completion In case the SKOSRec engine is ap-plied in a live setting the design of the user interfacewill have to be tested with more users Our experimen-tal series focused on algorithmic performance whichwas tested with numerous participantsSubjects were recruited through the clickworkercomplatform8 Participants received a compensation fee

6httptomcatapacheorg7httpwwworaclecomtechnetworkjavajavaeeservletindexhtml8httpswwwclickworkerde

of 100e or 150e for a completed survey 103 par-ticipants took part in the study on travel destinationsearch In the multimedia domains in total 154 sub-jects evaluated the SKOSRec engine For each partof the evaluation it was ensured that participants hadnot just run through the evaluation screen Wheneverthere appeared to be irregularities in the log files (egthe default settings of the screen were not changed)these assessments were excluded from further analy-sis The survey template started with a consent formIt informed participants about the context and the pro-cedure of the study After having filled in the consentform participants were navigated to the first part of theexperiment It collected data on subjectsrsquo demograph-ics and their consumption and search habits After thepreparation phase participants worked on the first testcase (TC1) in each usage scenario In this test case theperformance of the SKOSRec engine was assessed inthe simple execution mode of on-the-fly recommenda-tions (see Subsect 32) It was done to gather data onthe usefulness of suggestions resulting from a simplecontent-based approach (ie baseline method) withwhich more advanced recommendation strategies werecompared at a later stage In cases where users did notstate any items the application showed an error pageWhile setting up a profile required some effort on theside of the user it was a necessary step for the experi-ment Since we did not have access to the log files of alive application profile generation had to be simulatedThe process was facilitated by a specifically tailoredinterface (see Fig 2) where subjects could search foritems through an autocomplete field that applied AJAXtechnology9

It enabled users to type in keywords (eg the nameof a music act or the author of a book) for whichmatchings were instantaneously displayed withoutreloading the site Hence participants could quicklyfind and select their items of interest Whenever auser entered a query through the AJAX interfacethe system matched the query keywords against anApache SolrLucene search index10 The index wasbuilt with item metadata which had been extractedfrom DBpedia prior to setting up the website Theindex comprised the names and abstracts of itemsWhenever available information on thumbnail pic-tures were saved in the index to be ready for display inthe AJAX interface In case the Apache SolrLucene

9httpapijquerycomjqueryajax10httpluceneapacheorgsolr

14 L Wenige et al Similarity-based Graph Queries

Fig 2 User profile generation TC1 (music domain)

search engine found index keywords that matched theuser query metadata information for these items wereimmediately shown on the screen This informationwas displayed to help participants make an informeddecision on whether the AJAX result list containedsuitable items for profile generationAside from the AJAX feature the appeal of the in-terface was increased by a progress bar at the top ofthe webpage (see Fig 2) Progress bars are commonlyutilized tools for web surveys In conventional paper-based studies subjects usually see how far ahead theyare while in a web survey they cannot check theirprogress immediately This is especially true for stud-ies that apply a screen-by-screen navigation whichwas the chosen approach for the conducted web ex-periments Progress bars prevent users from tiring ofthe questionnaire and withdrawing from the study alto-gether by showing how close participants are to com-plete the experiment [14]Upon profile generation the SKOSRec engine calcu-lated on-the-fly recommendations for a simple query(similar to Q1 Sect 3) which were shown to usersin a separate screen Suggestions were displayed witha sufficient amount of information eg thumbnailslabels or a short description to help participants as-sess the quality of the recommendation In addition tothis information users could also access the respectiveWikipedia article by following the hyperlink on the la-bel Relevance sliders were used to assess the utility ofeach recommendation They are often applied in eval-uations of IR systems [51 54] Scores were handled aspoints on a 0 to 100 scale of the relevance slider (outerleft side lowest relevance outer right side highest rel-evance see Fig 3)

This approach facilitated the calculation of an accu-rate precision score and the application of tests withhigher statistical power than an ordinal 5-star scaleThe score called mean relevance (mrs) was measuredwith Equation 19

mrs =

ksumi=1

scorei

k(19)

In addition to the precision score the web applica-tion also determined the size of the recommendationlist produced by the engine as an indicator for recallHowever recommendation list size can only be uti-lized for this purpose when mrs scores reach a reason-able levelWhile accurate recommendations are useful becausethey build trust in a system [57] they only capture anarrow aspect of performance [36] For instance anonline retailer can tremendously profit from an RSthat provides both relevant and unobvious recommen-dations since it enables users to explore the prod-uct catalog more thoroughly On the other hand inthe case of an RS only suggesting popular productsit is likely that the user already knows them [3 23]A pure accuracy-based evaluation disregards these as-pects which is why other performance indicators suchas novelty were measured as well Thus the evalua-tion interface also contained a section where subjectshad to tick a radio button that indicated whether anitem was new In the backend of the web applicationnovelty was measured as the ratio of relevant items not

L Wenige et al Similarity-based Graph Queries 15

Fig 3 Recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

familiar to the user (|nov|) among the total numberof relevant items (|tp|) (Eq 20)

nv =|nov||tp|

(20)

An item was considered relevant when it achievedan mrs score of 50 or higher In addition to novelty anddiversity the web application also tracked the diversityof recommendation lists since a topically diversifiedresult set has been found to increase overall user satis-faction [31 66] In the RS literature the most widelyexplored diversification metrics are the ones that mea-

sure the similarity between each item pair in the rec-ommendation list The more the items are alike theless diverse is the result set [19] The study series ofthis paper applied the metric by Castells et al [8] Itdetermines item-to-item similarity values with the co-sine similarity measure Scores are then converted todistance values (Eq 21) and averaged for the recom-mendation list (Eq 22)

d(il im) = 1minus cos(il im) (21)

16 L Wenige et al Similarity-based Graph Queries

divC =2

k(k minus 1)

sumlltm

d(il im) (22)

The content-based diversity scores (divC) were cal-culated in the backend of the web application andsaved in a log file for each generated recommenda-tion list of the SKOSRec engine during runtime execu-tion In addition to the measurement of divC scores weasked participants about their opinions on recommen-dation list diversity directly Following the RS evalua-tion framework by Pu et al [44] two Likert items wereposed for this purpose

ndash The recommendations of list [] are diversendash The recommendations of list [] are similar to

each other

Besides the above listed algorithmic performancemetrics the authors gathered assessments from usersregarding the overall usefulness of the suggestionsThis approach is also in line with the empirically veri-fied evaluation framework by Pu et al [44] which pro-poses to measure the psychometric construct of Per-ceived Usefulness to get a comprehensive understand-ing of user opinions on recommendation results Thusparticipants were instructed to state their agreement topositive statements regarding the relevance of the sug-gestions The statements were adapated to the speci-ficities of each usage scenario Fig 4 shows an exam-ple collection of Likert questions for the domain ofmusic RS

After participants had provided the required infor-mation their answers were saved in a log file and theywere navigated to test case 2 (TC2) which evaluatedthe SKOSRec enginersquos performance for constraint-based queries (similar to Q3 Sect 3) We tried to findout if the engine was able to improve user satisfactionthrough filter conditions Even though IR systems al-ready successfully apply facet filters on result sets theexperiments still had to prove whether LOD-enabledconstraints have the same effect on the Perceived Use-fulness of recommendations For the comparisons ofnon-filtered and filtered suggestions we applied awithin-subjects design in which participants had torate two recommendation lists resulting from regular(non-filtered) and filtered requests in TC2 We favoredthis setup over a between-subjects design where sub-jects would have been assigned to only one result listWhile the between-subjects design more closely re-sembles a real-world setting since it does not require

participants to interact with more than one approachat a time differences among subjects (eg regard-ing demographics or topic expertise) can potentiallyhave a considerable impact on quality judgments [19]Hence when applying a between-subjects design onedoes not know whether differences in performance arecaused by the approaches or by the variability amongparticipants Therefore researchers need to conductbetween-subjects experiments with a sufficient numberof study subjects to level out these differences Within-subjects studies on the other hand require fewer par-ticipants while still achieving the same level of sta-tistical power since between-subject variability is notan issue [30 31] Beel et al have shown that evensmall variations in an RS test setting could cause sub-stantial differences in performance outcomes [6] Tokeep the extent of variations even lower comparisonsbetween non-filter and constraint-based recommenda-tions were based on the same profile for each user inTC2 Therefore the user profile from TC1 served as astarting point for filtered retrieval The web applicationshowed the recently generated profiles to users andasked them to provide an additional filter conditionFig 5 depicts an example web form for a constraint-based recommendation request in the music domain Itcomprises the user profile and an additional text fieldwhere users stated their constraint

Afterwards the engine applied this filter conditionon the set of potential recommendation results be-fore similarity calculation started For each usage sce-nario different types of constraints were available Re-garding suitable filter options the author sought toachieve a balance between assisting users in finding fil-ter conditions that would adequately represent their in-formation needs while keeping the variability amongusers as low as possible Therefore the specificitiesof each scenario and the availability of the respectivedata sources were determined For instance for themultimedia domains (movie music and books) it wasassumed that users would like to filter recommenda-tions according to the specific genre of the item Fortu-nately the DBpedia dataset contains this kind of infor-mation for the music domain In this domain the genrecan be retrieved through the properties dbogenreHowever in the movie and book domains DBpediadoes not provide sufficient information Many LOD re-sources which represent books were not assigned agenre In the movie domain a genre property is miss-ing entirely and genre-specific information can of-ten only be found in the subject categories describ-ing the movie Hence in contrast to the experiment on

L Wenige et al Similarity-based Graph Queries 17

Fig 4 Evaluation screen for recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

music recommendations participants in the domainsof books and films did not filter their recommenda-tion lists by genre On the other hand the propertydcsubject was applied as recommendation filter ineach multimedia domain Since the subject propertyoften links to many informative features of multimediaitems (eg release period geographic information orcontent information) the author decided that this fea-ture represents a powerful filter dimensionIn the experiment on travel destination search the webinterface offered two filters a subject and a location-specific filter The subject filter enabled users to statetheir preferences regarding the characteristics of a lo-cation (eg being a nature reserve) The location-specific filter on the other hand gave participants theoption to specify where the desired destination shouldbe located11

The second fundamental research issue of TC2 con-cerned the question whether expressive constraints canboost recommendation quality even further Hence

11httpdbpediaorg

apart from executing a constraint-based workflow witha simple filter (ie a direct attribute of an item)the SKOSRec engine generated suggestions with thehelp of an expressive filter as well This second pro-cedure while still applying the same user constraintas the simple filter utilized an expressive graph pat-tern to include additional LOD resources in the resultset For instance in case a subject filter was selectedresult set expansion was facilitated by exploring thewider semantic space of the DBpedia category graphthrough a property path declaration in the graph pat-tern (eg skosbroader[2]) In the travel exper-iment an additional expressive filter retrieved place-related information (dulhasLocation) for location-specific subproperty relations that are not material-ized in DBpedia Hence the expressive graph pat-tern expanded the filter such that additional proper-ties were matched (eg dbocountry dbocity ordbostate) while still applying the specified userconstraint on the set of LOD resources Upon exe-cution of constraint-based recommendation requestsusers received two recommendation lists (resultingfrom simple and expressive filtering accordingly) in a

18 L Wenige et al Similarity-based Graph Queries

Fig 5 Web form for a contraint-based recommendation request (music domain)

separate evaluation screen Participants evaluated re-sult sets in the same manner as in TC1 For each resultset the screen contained an additional Likert item ask-ing subjects to give their agreement to the statementldquoThe filter has improved the recommendation resultsof listrdquo The sequence of suggestions resulting fromthe different methods was randomly assigned in eachuser session to avoid order effectsAfter study subjects had completed TC2 they werenavigated to test case 3 (TC3) In the TC3 section ofthe travel experiment participants could choose be-tween three options for rollup query patterns (simi-lar to Q6 Sect 3) Users were able to obtain recom-mendations for travel destinations based on POIs theyhad visited during the stay in another destination Theycould either select a city (option 1) a region (option 2)or a country (option 3) as destination type Upon en-tity type selection the SKOSRec engine generated ap-propriate suggestions Fig 6 shows the web form fromthe travel experiment that facilitated the formulation ofadvanced rollup requests in TC3

When users had selected an entity type stated theirfavorite travel destination and three belonging POIs

the application generated a SKOSRec query from theseparameters A simple on-the-fly request was sent tothe engine as well to retrieve baseline recommenda-tions with which the suggestions resulting from theadvanced request were compared at a later stage ofthe experiment The simple query only contained theuserrsquos favorite travel destination and the selected en-tity type option thus omitting information on the POIsDuring the travel experiment the engine processedboth the simple and the advanced query and producedtwo recommendation lists As in the previous parts ofthe experiment participants assessed the quality of theresult sets which appeared in random order on thescreenThe multimedia experiments also contained a sec-tion on rollup retrieval (TC3) As the advanced travelqueries the multimedia requests aggregated similarityscores of sublevel entities through summation whichthe engine joined with a postfilter In addition to thesefeatures the pattern contained a preference query partThis section identified a set of preferred items througha single user statement Participants either specifiedtheir favorite directoractor (music domain) music act

L Wenige et al Similarity-based Graph Queries 19

Fig 6 Travel - User profile generation TC3

(music domain) or author (book domain) and receivedrecommendations based on the features of the workscreated by the artist (similar to Q5 Sect 3) TheSKOSRec engine generated recommendations basedon the creative works of the favored artist Partici-pants also received suggestions from a simple on-the-fly query which computed results according to theartistrsquos characteristics As in the travel experiment rec-ommendations from the advanced rollup request werecompared with the on-the-fly query without letting par-ticipants know which approach they were assessingand through random assignment of result set positionsAfter participants of the multimedia experiments hadevaluated the results of TC3 they were guided to testcase 4 (TC4) which evaluated cross-domain queriesThe experiment on travel RS ended with TC3 becauseno suitable graph patterns could be identified to fa-cilitate these kinds of suggestions for the travel usagescenario In the multimedia domain however the datafrom DBpedia was sufficient for this retrieval task Theentity type of the study (ie movie music act or book)defined the source domain based on which the enginegenerated suggestions for items from another multime-dia target domain (similar to Q7 Sect 3) Fig 7 de-picts the cross-domain web form of the music experi-ment

The web form assisted subjects in formulating a re-quest which then obtained movie suggestions basedon the favorite music act of the user Parameters fromusers were entered in the placeholders of the querypattern in the backend of the application We formu-lated the query templates in such a way that the queryengine matched any properties or graph patterns thatcould potentially connect two items from the specified

target and source domains For instance such match-ings can occur when a user states his favorite moviethat is written by a particular author In case the sameauthor has also written some books they might be ofinterest to the user as well Another example stemsfrom the movie domain Suppose in hisher profile aconsumer has declared a preference for a movie thatlinks to the corresponding soundtrack It might wellbe the case that the user likes the soundtrack and themusic acts that were involved in creating it just asmuch as she likes the movie The practice to distin-guish homonymous LOD resources through the prop-erty dbowikiPageDisambiguates is another in-teresting linking pattern from DBpedia that was ex-ploited for cross-domain requests in TC4 since it con-nects similar entities (ie the book edition of a certainmovie)Cross-domain SPARQL queries (similar to Q8 Sect3) were posed as baseline requests in this test caseWhile the SPARQL query only matched cross-domainrelations for a single LOD resource the SKOSRec re-quest expanded the search space by considering linksfrom more than one item As in the previous test casesTC4 ended with an evaluation screen where users as-sessed the SPARQL-based as well as the SKOSRecsuggestions in random order without knowing whichrecommendation list belonged to which method Table10 gives an overview of the test cases and the meth-ods that were applied in the experiments in each usagescenario

42 Results

In total 257 people took part in the studyThe gath-ered samples from the experiments contained answers

20 L Wenige et al Similarity-based Graph Queries

Fig 7 Music - User profile generation TC4 (page 9)

Table 10Test Cases in the web experiments

Test Case Evaluation Purpose Query Type Travel Movie Music BookTC1 On-the-fly Recommendations (Baseline) Q1 X X X X

TC2 Constraint-based Recommendations Q3 X X X X

TC3 Advanced Queries (Rollup) Q5 Q6 X X X X

TC4 Advanced Queries (Cross-Domain) Q7 x X X X

from both genders with a slight overrepresentation ofmale participants (537) The prevailing number ofstudy subjects was between 20 and 40 years of age(658) The evaluation of the demographics sectionalso revealed that the majority of participants alreadyperceives the search in their domain of interest as ei-ther ldquomanageablerdquo (385) or even as ldquoeasyrdquo or ldquoveryeasyrdquo (440)Prior to analyzing the performance results of the novelretrieval approaches of the SKOSRec engine it waschecked whether the respective Likert items for Per-ceived Usefulness and user-based diversity (divU)reached acceptable reliability levels For this purposeCronbachrsquos α values were determined Table 11 showsthe outcome of this analysis for the Perceived Useful-ness construct Given the acceptable values through-out the domains (Cronbachsrsquo s α gt 07) it could betreated as an interval-scaled variable in the statisticaltests

Table 11Cronbachrsquos α values for the concept of Perceived Usefulness

Domain Cronbach αTravel 07002

Movie 08579

Music 08515

Book 08949

Besides usefulness user-based diversity scores werealso measured by multiple Likert items However

Cronbachrsquos α values for this concept did not reach ac-ceptable levels In order to avoid that valuable user as-sessments remained unused agreements to the state-ment bdquoThe recommendations of list [] are diverse ldquowere taken into account as an ordinal variable (divU)Based on these results subsequent tests could be car-ried out The first part of the analysis concerned theperformance of the on-the-fly retrieval method (base-line) of the SKOSRec engine in TC1 The outcomeof this analysis confirms that the on-the-fly approachrepresents a viable recommendation strategy Table 12shows the mean (M)12 standard deviation (S D) andnumber of participants (N) for each performance di-mension and metric It can be seen that users assessedthe enginersquos performance in almost each quality di-mension to be above mediocrity Aside from the posi-tive user evaluations with regard to Perceived Uselful-ness related quality aspects such as Accuracy Noveltyand Diversity received good assessments as well Forinstance each domain mean relevance score (mrs) washigher than 50 score points It means that participantswere often convinced of the relevance of the resultsGiven these positive user assessments with regard toitem relevance it seems to be justified to consider themean of result list sizes as an indicator of recall Theengine was able to generate the required number ofrecommendations in almost each test case (see Table

12In case of the user-based diversity score (divU) M denotes themedian because of the ordinal scale of the variable

L Wenige et al Similarity-based Graph Queries 21

12) The on-the-fly retrieval approach was also testedwith regard to the Novelty dimension Table 12 showsthe mean novelty scores (nv) for the baseline recom-mendation approach Novelty scores indicate that themajority of items in a result list were deemed to be newin the travel and book RS whereas in the experimentson movie and music RS only one third of the recom-mendation list was unfamiliar to users In contrast tothat content-based Diversity scores (divC) were ratherhigh in each domain Hence it can be concluded thatitems in a result set usually have a low rate of match-ing SKOS annotations In addition to content-basedscores user assessments of recommendation list Di-versity (divU) were also taken into account The me-dian level of agreement to the Diversity-related Lik-ert item stating that recommendations are diverse liedat 3 (ldquoneutralrdquo) in the multimedia domains and at 4(ldquoagreerdquo) in the travel domainIn summary the baseline approach achieved fairlygood performance results in TC1 The findings demon-strate that LOD-enabled recommendations can po-tentially enhance a userrsquos search experience Theyalso allow for subsequent tests of advanced retrievaltechniques which can be compared to this standardmethod

In the analysis of performance results from TC2(constraint-based recommendations) particular atten-tion was paid to participantsrsquo agreements with thestatement ldquoThe filter has improved the recommenda-tion results of the listrdquo Fig 8 shows the distribution ofdomain-wise agreement ratios On average they werefairly high Throughout the domains the vast majorityof participants either selected 5 (ldquostrongly agreerdquo) 4(ldquoagreerdquo) or 3 (ldquoneutralrdquo) Hence a filter often had apositive impact on results

The authors gathered responses to the questionon improvement for recommendation lists resultingfrom both the simple and the expressive filtering ap-proaches Hence they give clues about the generalperformance of constraint-based retrieval Howeversince this section of the study revealed the experimen-tal condition to participants (ie application of a fil-ter) answers to the Likert statement have to be inter-preted cautiously Results may be slightly biased asparticipants were able to guess the underlying agendaHowever these limitations do not exist for compar-isons between simple and expressive filters Since theweb application randomly assigned the list order par-ticipants did not know which approach they were as-sessing This enables us to take a closer look at the dif-ferences between the two filtering approaches It was

Table 12Performance results in TC1 (M = Mean S D = Standard DeviationN = No of participants)

Dimension (Metric)[Max Score]

Domain Results

M SD N

Accuracy (size) [10]

Travel 1000 000 103Movie 1000 000 50Music 1000 000 53Book 980 140 51

Accuracy (mrs) [1]

Travel 5649 000 103Movie 6225 1915 50Music 5570 010 53Book 5747 1555 50

Novelty (nv) [1]

Travel 060 034 101Movie 036 031 50Music 036 039 52Book 061 029 47

Diversity (divU) [5]

Travel 4 - 102Movie 3 - 50Music 3 - 53Book 3 - 50

Diversity (divC) [1]

Travel 080 016 103Movie 087 012 50Music 086 010 50Book 092 008 50

Usefulness [5]

Travel 334 085 103Movie 355 073 50Music 318 085 53Book 343 081 50

hypothesized that advanced filters improve recommen-dation quality as they may identify more relevant re-sources thereby overcoming problems of data qualityor data sparsity in LOD repositories Table 13 lists re-sponse rates of the two methods The rate measuresthe ratio of non-zero result sets among all result setsThroughout the domains recommendation lists result-ing from expressive filtering achieved higher responserates

Table 13Response Rates (TC2)

Domain Constr-based(Regular)

Constr-based(Expanded)

N

Travel 23 57 103

Movie 86 88 50

Music 72 75 53

Book 73 76 51

22 L Wenige et al Similarity-based Graph Queries

000

025

050

075

100

Digital Library Travel Movie Music Book

Domain

Per

cent

age

of A

gree

men

t

Degree of Agreement

1

2

3

4

5

The filter has improved the recommendations of the list

Fig 8 Participantsrsquo agreement with the statement that the filter has improved the result list (domain-wise) (TC2)

In addition to response rates user assessments andscores in the defined quality dimensions were com-pared Table 14 depicts the outcome of this analysisSignificantly higher scores are marked in bold figuresIn case statistical tests identified no meaningful varia-tions the results of the better performing approach areunderlined A-levels were adjusted with the false dis-covery rate (FDR) in order to decrease the probabilityof making a type I error due to multiple testing

According to mean scores expressive requestsgenerated more comprehensive recommendation lists(size) This finding is in line with the increased re-sponse rates shown in Table 13 However subse-quently conducted t-tests confirmed significant dif-ferences only for the travel experiment (t(102) =minus789 p lt 0001) In contrast precision scores werehigher in most of the domains (travel movie mu-sic) when the SKOSRec engine processed a simpleconstraint-based query However statistical tests didnot confirm any systematic differences The authorsapplied Wilcoxon tests for mrs scores and the remain-ing performance metrics because of the small samplesizes resulting from the low response rates of the sim-ple constraint-based approachRegarding Novelty none of the approaches was supe-rior In the Diversity dimension expressive filters gen-erated suggestions that were at least as topically di-versified as the results from simple constraint-basedretrieval Even though no significant differences were

identified between the two approaches increased meanscores (divU in the travel domain and divC in each do-main) indicate a slight superiority of expanded filterrequests in this quality dimension The same appliesto usefulness scores (mean values were higher in 3 outof 4 domains when expressive filtering was applied)However these statements are speculative at best be-cause statistical tests were not significantIn summary it can be concluded that expressive fil-ters improve recall potentially leading to a slight lossin precision scores (see Table 14) It may also be thecase that expanded user constraints diversify as wellas increase the usefulness of recommendation listsHowever these claims are unverified In contrast itis safe to say that expressive query patterns generateresults of at least the same quality as simple patternswhile increasing the number of relevant suggestionsThe few differences may be explained by the high Jac-card scores (JI) (see Table 15) of recommendation listsThroughout the domains result sets were much alikeGiven these similarities and the increases in list sizesfor expressive filters it is supposed that an expandedconstraint produces an enhanced version of the recom-mendation list resulting from simple filtering There-fore expanded filters should be the default setting in aconstraint-based retrieval context

In TC3 (ie evaluation of roll-up query patterns)the SKOSRec engine almost always generated non-empty recommendation lists (see Table 16) In the

L Wenige et al Similarity-based Graph Queries 23

Table 14Performance results in TC2 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Simple Expressive N

M SD M SD

Accuracy (size) [10]

Travel 117 270 452 462 103Movie 672 408 720 392 50Music 596 464 636 451 53Book 459 450 500 451 51

Accuracy (mrs) [1]

Travel 6673 2051 6534 2236 23Movie 6095 2298 6029 2279 43Music 6377 1605 6296 1644 34Book 5617 1818 5681 1681 37

Novelty (nv) [1]

Travel 048 041 061 049 21Movie 044 036 055 080 42Music 055 039 052 037 38Book 077 034 076 033 33

Diversity (divU) [5]

Travel 3 - 35 - 24Movie 4 - 4 - 43Music 4 - 4 - 38Book 3 - 3 - 37

Diversity (divC) [1]

Travel 068 018 072 021 19Movie 070 024 073 023 41Music 086 013 087 012 34Book 084 024 095 063 29

Usefulness [5]

Travel 333 093 350 077 24Movie 332 085 331 086 43Music 355 081 364 078 38Book 339 083 362 121 37

Table 15Jaccard Indices (TC2)

Domain Mean Jaccard Index (JI) SD N

Travel 057 047 25

Movie 071 038 43

Music 082 035 37

Book 092 026 35

travel and the book experiment the response rate washigher for aggregation-based queries whereas in themovie domain the engine more often provided recom-mendations for regular requests However these dif-ferences are only marginal and do not indicate a clearsuperiority of one method

Fig 9 depicts participants general agreement to Lik-ert statements concerning the Perceived Usefulness ofthe approaches in TC3 The diagram shows similarresults for regular as well as advanced requests Onaverage satisfaction scores reached levels which layslightly above neutral agreement A subsequent t-test

Table 16Response Rates (TC3)

Domain Regular Aggr-based N

Travel 99 100 103

Movie 96 92 50

Music 100 100 53

Book 96 100 51

confirmed no significant differences between the twomethods In fact the approach with the highest aver-age score of Perceived Usefulness was different in eachexperiment While mean values were higher for regu-lar requests in the studies on travel and music RS ad-vanced queries produced higher scores in the movieand book experiments

Content-based Diversity (divC) was the only met-ric with systematic differences Here scores went upfor advanced retrieval patterns in the travel (t(98) =minus550 p lt 0001) the music (t(52) = minus451 p lt

0001) and the book domain (t(47) = minus490 p lt

24 L Wenige et al Similarity-based Graph Queries

0

1

2

3

Travel Movie Music Book

Domain

Agr

eem

ent

Method

Regular

Rollminusup

Perceived Usefulness (Regular vs Rollminusup Queries)

Fig 9 Participantrsquos overall satisfaction with regular and roll-up recommendation retrieval (TC3)

0001) (see Table 17) This finding is not surprisinggiven that the aggregation-based approach explores thewider semantic network of each item in the profile

Hence it can be assumed that both approaches pro-duce recommendation lists of similar quality and canprovide benefits to potential consumers Thus it can-not be clearly stated which approach is superior be-cause neither of them outperformed the other in eachdomain While the regular retrieval strategy was bet-ter in terms of usefulness in the travel and the mu-sic domains the aggregation-based approach achievedhigher scores in the domains of movie and book rec-ommendationsThis conclusion is especially impressive given thelow mean Jaccard scores for recommendation lists re-sulting from regular and aggregation-based retrieval(see Table 18) The mean score never exceeded 01

throughout the domains It indicates a low concor-dance between result sets It is remarkable that par-ticipants perceived the recommendations as equallygood given the high dissimilarity of the lists Thusadvanced queries have an added-value as they provideadditional interesting results They can help to exploreLOD repositories in different ways than a regular re-trieval approach Therefore it is worthwhile to offeraggregation-based query patterns as an alternative re-trieval strategy when the regular approach has not pro-duced any helpful recommendations

In addition to TC3 user assessments for TC4 (cross-domain recommendations) were also evaluated How-ever since the authors conducted experiments for TC4only in the multimedia domains the samples were con-siderably smaller Nevertheless it is assumed that theamount of completed user sessions is only just enoughto conduct further statistical analyses Table 19 showsthe response rates for the two retrieval methods Cross-domain patterns had a higher success rate of generat-ing non-zero result sets throughout the domains Par-ticipants almost always received a recommendation forthis query type whereas in case a regular SPARQLquery was executed often they did not receive a singlesuggestion at all

We applied non-parametric tests for the perfor-mance metrics mrs nv divU divC and PerceivedUsefulness due to the low response rates of theSPARQL-based approach which led to a low num-ber of comparable data points accordingly Overall theFDR-adjusted pairwise Wilcoxon tests detected no sig-nificant differences except for user-based Diversityscores (divU) in the movie (Z = minus269 p lt 005) andthe book domain (Z = minus256 p lt 005) The resultsfor the content-based Diversity (divC) metric seem topoint in the same direction because of the high meanvalues for cross-domain queries in each domain Butthe significance of these results was not verified Themrs scores indicate a better performance of regular

L Wenige et al Similarity-based Graph Queries 25

Table 17Performance results in TC3 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular Aggr-based N

M SD M SD

Accuracy (size) [10]

Travel 990 099 942 207 103Movie 288 059 276 082 50Music 1000 000 1000 000 53Book 288 059 300 000 51

Accuracy (mrs) [1]

Travel 4821 2498 4493 2559 102Movie 5458 2288 5669 2542 44Music 5395 1831 5824 1619 52Book 5153 2390 4925 2230 49

Novelty (nv) [1]

Travel 043 039 038 040 98Movie 035 038 028 041 41Music 049 036 046 039 50Book 061 042 062 044 47

Diversity (divU) [5]

Travel 35 - 3 - 100Movie 4 - 3 - 43Music 4 - 3 - 50Book 3 - 3 - 48

Diversity (divC) [1]

Travel 079 017 089 013 99Movie 075 020 066 027 44Music 083 017 093 007 53Book 074 027 093 006 48

Usefulness [5]

Travel 336 077 322 074 101Movie 320 092 326 094 44Music 345 084 344 085 51Book 295 087 303 096 48

Table 18Jaccard Indices (TC3)

Domain Mean Jaccard Index (JI) SD N

Travel 007 011 101

Movie 003 007 44

Music 002 005 53

Book 002 006 50

Table 19Response rates (TC4)

Domain Regular (SPARQL) SKOSRec N

Movie 32 96 50

Music 62 98 53

Book 53 100 51

SPARQL queries but the within-subjects test did notconfirm this assumption statistically (see Table 20)

However the most important finding of the anal-ysis was that SKOSRec cross-domain queries gener-ated significantly more suggestions throughout the do-

mains (movie t(49) = minus1792 p lt 0001 mu-sic t(52) = minus2573 p lt 0001 book t(50) =minus1275 p lt 0001) (see Table 20) Fig 10 illustratesthis graphically

Whenever the SPARQL request provided recom-mendations which was the case in less than half ofthe user sessions (see Table 19) the SKOSRec sug-gestions were assessed to be of almost equal qualityas the SPARQL recommendations (see Table 20 Use-fulness) Hence when SPARQL querying did not pro-duce any results in more than the other half of the testcases the SKOSRec approach may still have provideduseful suggestions

Therefore it is reasonable to assume that the capa-bility of the SKOSRec engine to flexibly switch be-tween processing steps of similarity calculation andpattern matching can facilitate an improved explo-ration of LOD repositories Table 21 also shows thatthe items in the two sets often did not match as isindicated by low Jaccard indices This finding furtherdemonstrates that the execution of an additional step

26 L Wenige et al Similarity-based Graph Queries

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

00

25

50

75

100

Movie Music Book

Domain

No

of i

tem

s

Method

Regular

CrossminusDomain

Recommendation List Size

Fig 10 Mean recommendation list sizes for SPARQL and SKOSRec cross-domain requests (TC4)

L Wenige et al Similarity-based Graph Queries 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

of similarity calculation can help to retrieve other rel-evant items

5 Future Work amp Conclusion

This paper has presented the query-based SKOS-Rec engine that facilitates new types of recommen-dation requests for LOD repositories The evaluationshave shown that the SKOSRec system can often gener-ate relevant and useful suggestions when certain querytemplates are utilized In many cases the applicationof expressive or advanced query patterns (ie in thecontext of constraint-based retrieval or cross-domainrequests) helped to significantly improve recall valueswhile still providing the same level of quality in theremaining performance dimensions Additionally ad-vanced recommendation requests can be used to gener-ate diversified recommendation lists in case users arenot satisfied with the results of regular recommenda-tion requests (ie roll-up or cross-domain retrieval vsregular requests) While not being superior in each do-main and test case is at least safe to say that the novelrecommendation approaches of the SKOSRec engineconsiderably extend common retrieval methods and atleast provide an alternative search strategy when con-ventional methods fail to provide useful results Addi-tionally the increase in diversity and recall is in linewith findings from previous research on LOD-enabledRS [45]It is a strength of the developed approaches that theyfacilitate combinations of graph-based and similarity-based retrieval at different stages of the recommenda-tion workflow This feature extends general search ca-pabilities for semantic networks It requires future re-search to investigate whether in addition to SKOS fur-ther RDF vocabularies can be used for ad-hoc item-to-item similarity computation to increase the range ofpotential usage scenarios since not every LOD datasetcontains SKOS annotations [53] The application ofadditional properties could also boost the recommen-dation quality of some query patterns (eg such as thequery types used in TC3) In this context it also needsto be determined whether the representation of the user

profile in the SKOSRec query language can be substi-tuted with free-text expressions to enable search-likefunctionalities LOD researchers have already devel-oped engines that can process natural language queriesover RDF data [59] [34] [67] It will have to be inves-tigated how the SKOSRec engine profitably fits intothis landscape of existing retrieval toolsAnother open research question concerns the aspect ofinterfaces to other applications In its standard config-uration the system is not an end-user retrieval enginebut a backend application with which an administra-tor who has domain knowledge (eg of RDF vocab-ularies and data models of a particular LOD reposi-tory and usage scenario) creates the respective querypatterns Although the websites of the online experi-ments represent possible implementations of suitableend-user applications for the SKOSRec engine thereare still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selec-tion of LOD items for the profile was made possibleby previously storing the data in an index that couldthen be accessed and searched before issuing a recom-mendation query However this runs against the ideaof ad-hoc retrieval In this context future research willhave to determine whether it is feasible to generate anon-the-fly mapping from items to the respective LODresources In the case of a commercial application theengine needs to match items with the correspondingentries in the product catalog Even if it is not yet clearwhich UI components can make the best possible useof the SKOSRec enginersquos features the results from theuser experiments can give at least a few suggestionsThe following retrieval options should be the defaultsetting in an interface that contains a LOD-enabledRS component to assist users in finding interestingitems Since the regular approach of on-the-fly re-trieval achieved good results throughout the domainsthe display of simple recommendations generated frompreviously stated user preferences represents a suitablestarting point for retrieval On-the-fly queries could berefined by providing filter options Since the evalua-tions have shown that expressive constraints often leadto increased recall values the application of such afilter might be the best option for assisted retrievalAs in the preparations for the experiments appro-priate graph-based query patterns for the usage sce-nario in question would also have to be determinedby a domain expert before setting up the applicationThe same applies to advanced retrieval patterns whichshould be available to the end user to refine the querywhen both the simple as well as the constraint-based

28 L Wenige et al Similarity-based Graph Queries

approach have failed to provide relevant items Thissuggestion is made because roll-up queries were com-petitive with regular recommendations in the web-based experiments Therefore they represent a viablealternative retrieval strategy For some domains (egfor multimedia retrieval) it is also possible to usean optional selection field to enable the formulationof cross-domain requests Evaluations for this querytype suggest that users might receive interesting results(ie diversified and comprehensive recommendationlists) from such a requestIn summary the contributions of the paper are as fol-lows

ndash Development and implementation of novel re-trieval strategies in the SKOSRec system whichfacilitate personalized requests and leverage thepotential of knowledge graphs in order to addressthe research gap in state-of-the-art recommenda-tion methods with regard to customized queries

ndash Realization of web-based user experiments infour usage scenarios with the following out-comes

bull Successful demonstration that knowledge graphdata is applicable in the context of live recom-mendation scenarios

bull Creation of suitable query patterns that canbe applied to represent individual informationneeds in different application contexts The setof query patterns can be extended with the helpof the SKOSRec query language to meet therequirements of additional usage scenarios

bull Proof that the novel retrieval strategies can besuccessfully applied across various applicationcontexts and recommendation tasks and thatthey outperform conventional content-basedretrieval approaches especially in the perfor-mance dimensions of diversity and recall

The prototypical implementation and the evaluationof the SKOSRec engine in different usage scenarioshave demonstrated that the freely available knowledgesources on the LOD cloud can be successfully appliedto advance state-of-the-art methods in IR and recom-mender systems The proposed personalized and cus-tomized retrieval strategies can leverage the potentialof knowledge graphs by better matching informationneeds of users with the help of semantic search op-tions

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be com-prised of up to six elements of which the partsSimProjection and ItemPart are obligatory Henceusers need to state at least their preferencesand how many recommendations they wouldlike to receive Advanced retrieval patterns canbe formulated as well For instance similar re-source retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (Se-lectPart) of LOD resources As in SPARQL thequery can start with a prologue that abbreviatesIRIs with namespace declarations [21]

SelectPart The SelectPart part enables subqueryingwith recommendation results The surroundingquery is a SELECT query with a slightly simpli-fied WHERE clause (ie RecWhereClause) withwhich postfilter conditions can be formulatedIn this section users have the option to formu-late aggregation-based retrieval requests (Aggre-gation)

Aggregation This query part handles aggregation-based postfiltering Users specify the IRI (IRIref )resource for which similarity scores of sublevelentities are summarized Additionally it is statedhow the data should be aggregated (ie SUMMAX or AVG) as well as which variable (Var)represents the aggregation-based recommenda-tions in the postfilter section The compiler makessure that this variable is actually contained in theRecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refersto the list of generated suggestions It definesthe recommendation variable to which all simi-lar LOD resources are mapped In case pre- andpostfilter conditions are set in the other sectionsthe compiler checks whether the variable (Var)appears in these sections as well Otherwise therequired join operations cannot be executed Itis specified how many recommendations shouldbe displayed (Integer) and whether the requestwill be issued against a default repository or as across-repository request (ServiceIntegration)

ServiceIntegration This section indicates that a userintends to receive recommendations from a dif-ferent SPARQL endpoint than the default end-point that is stated in the standard configurationof the SKOSRec engine It specifies the targetSPARQL endpoint (IRIref ) from which a user

L Wenige et al Similarity-based Graph Queries 29

wants to receive recommendations Upon extrac-tion of SKOS annotations from the default sourceendpoint a subsequent request is sent to the spec-ified target endpoint

ItemPart This section of a SKOSRec query repre-sents a single preference in the user profile How-ever recommendation queries can contain a cou-ple of preference statements A preference is ei-ther expressed as a LOD resource (IRIref ) or asa variable which is bound to a preference query(VarPart) In the latter case the resources areobtained by matching the graph pattern in theVarPart with the triple statements in the reposi-tory The ItemPart statement can be additionallysupplemented with a concept-to-concept similar-ity threshold (Sim) that triggers concept expan-sion with the approach of flexible similarity de-tection which is extensively described in one ofour previous works [64]

VarPart The construct initiates preference queryingIndividual likings are expressed as a graph pat-tern in a RecWhereClause The variable (Var) isa placeholder for the LOD resources that will beused to set up the user profile The compiler hasto make sure that the specified variable occurs inthe RecWhereClause as well Otherwise the ex-traction of LOD resources cannot be carried outcorrectly

Sim The Sim part of a SKOSRec query denotes thethreshold of concept-to-concept scores for flexi-ble similarity detection Even though knowledge-based similarity values usually range between 0and 1 the specification allows a broader range ofdigits in the Decimal datatype to enable applica-tion of other similarity metrics in case they areneeded

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users canstate whether scores should be ldquolargerrdquo ldquolargerthanrdquo or ldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three dif-ferent sections of a SKOSRec query ie either inthe prefilter (ItemPart) the postfilter (SelectPart)or the preference filter section (VarPart) TheRecWhereClause closely resembles the ldquoWhere-Clauserdquo of the SPARQL 11 specification withall its subsequent parts [21] Hence it enablesdifferent combinations of graph pattern match-ing However the RecWhereClause of the SKOS-Rec language allows fewer pattern matching ex-pressions than the ldquoWhereClauserdquo of the regular

SPARQL syntax specification [21] For instanceit is neither permitted to formulate subqueries norto direct filter requests to more than one reposi-tory (see RecGraphPatternNotTriples for furtherexplanations) This modification has been madeto simplify similarity calculation After parsingthis section the compiler makes sure that recom-mendation preference or postfilter variables oc-cur at least once in the RecWhereClause to guar-antee that subsequent steps of the recommenda-tion workflow can be applied on existing LOD re-sources In case the RecWhereClause comprises aRecMinusGraphPattern it also has to be checkedthat the variable is not only contained in theMINUS part of the group

RecGroupGraphPattern A RecGroupGraphPatterncan contain one to many basic graph patternswhich can be differently combined according toRecGroupGraphPatternSub

RecGroupGraphPatternSub This section defines thegraph patterns that are applied to identify suit-able LOD resources User filters can be either ex-pressed as basic graph patterns (TriplesBlock) oras combinations of them (RecGraphPatternNot-Triples) The sequence of different kinds of pat-terns can be flexibly specified

RecGraphPatternNotTriples This query part closelyresembles the similar named ldquoGraphPatternNot-Triplesrdquo of the SPARQL 11 specification [21]The only difference to SPARQL is that the Rec-GraphPatternNotTriples section is a little morerestricted in terms of admissible graph patternsthan the ldquoGraphPatternNotTriplesrdquo section of theregular SPARQL specification For instance itdoes not allow to retrieve LOD resources otherthan from the default graph that is specified in theconfiguration of the SKOSRec engine since filterpatterns have to be applied to datasets that containSKOS annotations Hence they need to be speci-fied before runtime execution to ensure that sim-ilarity calculation can be carried out correctly Ifa user was able to formulate filter conditions thatreferred to different RDF graphs and endpointsaccordingly this would not be possible

RecGroupOrUnionGraphPattern This query sec-tion marks an alternative graph pattern in the styleof the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntaxspecification [21] However it neither allowsSPARQL-like subqueries nor federated queries

30 L Wenige et al Similarity-based Graph Queries

for filtered resources because this would compli-cate similarity calculation

RecOptionalGraphPattern In this part of a SKOS-RecQuery users are enabled to specify additionalgraph patterns which might extend the query so-lution but do not necessarily have to match thedata

RecMinusGraphPattern This section handles exclu-sion of LOD resources for cases when users wantto omit certain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

L Wenige et al Similarity-based Graph Queries 31

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM A

mobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige J Ruhland Flexible on-the-fly recommendationsfrom Linked Open Data repositories In International Confer-ence on Business Information Systems pp 43ndash54 2016

[63] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[64] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[65] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[66] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[67] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Future Work amp Conclusion
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 10: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge … · Semantic Web 0 (0) 1 1 IOS Press Similarity-based Knowledge Graph Queries for Recommendation Retrieval Lisa Wenige*,

10 L Wenige et al Similarity-based Graph Queries

In the case of the movie recommendation scenario auser could state that she enjoyed Quentin Tarantinomovies in the past and would like to receive sugges-tions for directors that shot similar film Usually thiskind of question can typically only be answered byanother human being eg in a personal interactionamong friends or in an online forum The fact that thelanguage facilitates these kinds of queries is one of itskey advantages

Listing 7 SKOSRec query to generate post-filteredrecommendations based on a preference query in themovie domain (Q5)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX yago lthttpdbpediaorgclassyagogt

SELECT director movieWHERE movie dbodirector director director rdftype yagoDirector110014939 LIMIT 50AGG dbrQuentin_Tarantino director SUMRECOMMEND movie TOP 100PREF [ prefMovieWHERE prefMovie dbodirector dbrQuentin_Tarantino ]

Table 6 depicts the results of Q5 as retrieved fromDBpedia When comparing the solutions sets for Q4and Q5 it becomes clear that they are totally dif-ferent While Q5 generated a list of directors whohave mostly created action and thriller movies with atwist (ie Tarantinorsquos unique filmmaking style5) Q4provided a completely different output It shows USAmerican actor-directors whose movies range frommusical films (dbrHello_Dolly_(film)) towestern movies (dbrDances_with_Wolves) ordrama films (dbrInterview_(2007_film))Thus the directors in the solution table of Q4 (Table5) seem rather arbitrary and not exceptionally helpfulconcerning the initial user request

We define this kind of aggregation-based request asa so-called rollup query This definition was chosen be-cause the final suggestions are based on summarized(ie rolled up) preferences for sublevel entities An-other example for a roll-up request stems from the ap-

5httpsenwikipediaorgwikiQuentin_Tarantino

Table 6Result set of Q5

director moviedbrSteven_Spielberg dbrIndiana_JonesdbrSteven_Spielberg dbrJurrasic_Park_(film) dbrFrancis_Ford_Coppola dbrThe_Godfather_Part_IIdbrFrancis_Ford_Coppola dbrThe_Godfather_Part_III dbrChristopher_Nolan dbrThe_Prestige_(film)dbrChristopher_Nolan dbrThe_Dark_Knight dbrRobert_Rodriguez dbrPlanet_TerrordbrRobert_Rodriguez dbrMachete_(film) dbrKevin_Smith dbrClerksdbrKevin_Smith dbrClerks_II

plication scenario of travel search This recommenda-tion query derives suggestions for city trip destinationsfrom the points of interest (POI) a user has visited andliked in another city (eg London) Listing 8 showsthe corresponding SKOSRec query for this scenario

Listing 8 SKOSRec query to generate post-filteredrecommendations (aggregation-based) for a preferenceprofile containing the city of London (Q6)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX dullthttpwwwontologydesignpatternsorggtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgtPREFIX rdfslthttpwwww3org200001rdf-schemagt

SELECT cityWHERE poi property city city rdftype dboCity property rdfssubPropertyOf dulhasLocation LIMIT 3AGG dbrLondon city SUMRECOMMEND poi TOP 100PREF dbrOxford_StreetdbrNotting_HilldbrSouth_Bank

The SKOSRec request contains subquery com-mands that refer to the userrsquos preferred POIs It spec-ifies how similarity scores of the generated sugges-tions should be aggregated (ie sum-based) The re-trieval is based on the 100 POIs that are most simi-lar to the ones in the preference section Additionallythe subquery part refers to the DBpedia resource thatrepresents London (AGG dbrLondon city) thusindicating that the list of recommendations should notcontain any POIs in this city Upon processing the de-

L Wenige et al Similarity-based Graph Queries 11

picted query over the DBpedia repository the enginedisplays the three cities for whom the highest aggre-gated scores have been detected (see Listing 7)

Table 7Result set of Q6

citydbrOxforddbrAbingdon-on-ThamesdbrWallingford_Oxford

The motivation for formulating such a query is thatwhen searching for cities preferences might be betterrepresented through the sights a user has visited else-where rather than through a simple on-the-fly requestsolely containing a preference statement for a city inthe user profile sectionBesides the above presented roll-up retrieval patternsthe SKOSRec grammar may facilitate the formula-tion of additional types of advanced recommendationqueries For instance cross-domain queries are an in-teresting additional pattern to be explored for personal-ized retrieval Listing 9 shows an example request thatgenerates movie suggestions based on the preferencefor a music band (ie dbrThe_Jackson_5)

Listing 9 SKOSRec query to generate cross-domainrecommendations (Q7)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgt

SELECT movie participated musicalActWHERE movie rdftype dboFilm VALUES dbobasedOn dbomusicComposerdbogenre dbostarring musicalAct participated movie UNION movie participated musicalAct LIMIT 10AGG dbrThe_Jackson_5 movie MAXRECOMMEND musicalAct TOP 100PREF dbrThe_Jackson_5WHERE VALUES actType dboMusicalArtist dboBand musicalAct rdftype actType

The query contains a prefilter condition that trig-gers retrieval of music artists and bands Afterwardsthe scores of the top 100 suggestions are summarizedwith maximum-based aggregation Thus the highestsimilarity value among the set of music acts deter-

Table 8Result set of Q7

movie musicalActdbrMoonwalker dbrMichael_Jackson dbrThe_Wiz_(film) dbrAshford_amp_SimpsondbrGarfield_Gets_a_Life dbrThe_TemptationsdbrPipe_Dreams_(1976_films) dbrGladys_Knight_amp_the_PipsdbrThe_Jacksons dbrJermanine_Jackson

mines the ranking score of the film Also note be-cause of the aggregation-based retrieval procedureconnections between movies and the LOD resourcedbrThe_Jackson_5 are excluded from similaritycalculation The scenario exemplifies how semanticrelationships can help to generate suggestions for a tar-get domain (ie movies) that are based on a profilecontaining items from a different source domain (iemusic acts) While regular SPARQL queries performexact matching of triple statements the SKOSRec syn-tax facilitates the integration of similarity- and graph-based retrieval to enrich result lists with other poten-tially relevant itemsAs a point of reference consider the SPARQL queryfor the given cross-domain scenario (Listing 10)

Listing 10 SPARQL query to generate cross-domainrecommendations (Q8)

PREFIX dbr lthttpdbpediaorgresourcegt PREFIX dbo lthttpdbpediaorgontologygt PREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgt

SELECT movie musicalActWHERE movie rdftype dboFilm VALUES dbobasedOn dbomusicComposerdbogenre dbostarring dbrThe_Jackson_5 participated movie UNION movie participated dbrThe_Jackson_5 LIMIT 5

The query omits the similarity calculation partThus it only retrieves movies linked to the resourcedbrThe_Jackson_5 By this means the SPARQLresults have a better chance of being related to the ini-tial user profile On the other hand empty result listscan occur when the preferred LOD resource does nothave any direct connections to recommendable itemsThis assumption is backed up by the result lists of thetwo queries (Tables 8 and 9) The SKOSRec query pro-duces a more comprehensive solution than a regularSPARQL query

12 L Wenige et al Similarity-based Graph Queries

Table 9Result set of Q8

movie musicalActdbrThe_Jacksons dbrThe_Jackson_5

The following of definitions will formalize the post-filtered and aggregation-based retrieval forms Defini-tion 12 specifies the notion of SKOSRec recommenda-tions that may potentially undergo postprocessing

Definition 12 (SKOSRec recommendations) SKOS-Rec recommendations (R(Pr)) are the set of sugges-tions generated by processing the solution mappings(Ω) obtained from recommendation retrieval Theycan be a result of simple on-the-fly retrieval prefilter-ing or a result of a combination of these techniques Prrepresents the input profile that contains at least onepreference for an item The cardinality of R(Pr) is theintended number of recommendations (k) that is spec-ified by the user based on which the engine selects theresources with the highest recommendation scores thatare not contained in the profile (Eq 12)

R(Pr) = x | x isin q | score(Pr q) gt kmaxqisinΩ

score(Pr q) q isin Pr

(12)

The approach of subquerying with recommendationresults builds on the idea that SKOSRec suggestionsR(Pr) can be joined with any graph pattern Ppost pastto the process of similarity calculation (Definition 13)It does not make a difference how the engine obtainedthese recommendations

Definition 13 (Postfiltered recommendation mapping)The postfiltered recommendation mapping is obtainedby joining solution mappings generated from SKOS-Rec recommendations with the results of matching apostfilter graph pattern (Ωpost) (Eqs 13 and 14)

Ωpost f ilter = micro | micro isin [[Ppost]]D x isin dom(micro)

(13)

Ωpost = micro|x | x isin R(Pr) Ωpost f ilter (14)

As the prefilter Ppost can be comprised of triplestatements that are eligible in the RecGroupGraph-

Pattern part of the SKOSRec grammar Aggregation-based queries have to be treated differently than a reg-ular postfilter request Apart from excluding sublevelitems of the profile from the set of similar resourcesthe engine should also omit any connections betweenthe preferred superordinate item (a) and similar sub-level entities Additionally the user has to specify thevariable in his profile (eg y) that represents the itemswithin the postfilter graph pattern Ppost By this meansaggregation-based recommendations can be generated(Definition 14)

Definition 14 (Resources for aggregation-based re-trieval) The set of resources for aggregation-based re-trieval is obtained by retrieving LOD resources thathave a connection to a previously generated sugges-tion thereby excluding all resources that are linked tothe specified superordinate item a in the profile (Eq15)

Ragg(a) = micro(y) | micro isin Ωpost y isin dom(micro) a isin micro(y) (15)

For each potential aggregation-based suggestion (y)similarity scores of connected entities (x) have to beconsidered for the final ranking In this context differ-ent approaches to aggregation can be applied In caseswhen both similar items and aggregation-based recom-mendations are entities on comparable levels of gran-ularity then choosing the maximum similarity scoremight be the best way to represent the relatedness ofthe resource to the user profile On the other handwhen postfiltered recommendations are based on sim-ilarity scores of their sublevel entities (eg as in themovie recommendation example) the scores of relatedsuggestions should be aggregated by the sum or theaverage of individual scores

Definition 15 (Aggregation-based ranking score) Theranking score of an LOD resource y that is connectedto a set of recommended items is determined by aggre-gating scores of connected entities either by the max-imum (Eq 16) the sum (Eq 17) or the average (Eq18) of individual scores

scoreaggMAX(y) = maxxisinmicro(xy)

score(Pr x) (16)

scoreaggS UM(y) =sum

xisinmicro(xy)

score(Pr x) (17)

L Wenige et al Similarity-based Graph Queries 13

scoreaggAVG(y) =

sumxisinmicro(xy) score(Pr x)

|x isin micro(x y)|(18)

4 Evaluation of the SKOS Recommender

41 Experimental Setup

The novel recommendation strategies of the SKOS-Rec engine were evaluated in the context of web-basedexperiments for the usage scenarios of travel destina-tion search and multimedia RS The online approachhelped to reach out to more participants with diversedemographic backgrounds which would not have beenpossible in a laboratory experiment Another positiveside-effect of the web-based setting was that partici-pants could test and rate recommendations without be-ing directly observed by an experimenter which mighthave caused biased results otherwise Hence the on-line approach softened common limitations of labora-tory studies [19]The experiments were carried out in the defined usagescenarios with metadata descriptions from DBpediaThe database containing the local mirror of DBpediaran on a virtual server Besides the triple store a webapplication was hosted on the same machine The web-site ran on an Apache Tomcat 7 application server andwas implemented with the help of Java servlet technol-ogy67

We conducted the online experiments between April2016 and January 2017 Upon setting up the web ap-plication a pretest was carried out to control for po-tential pitfalls such as misleading navigation Two testusers ran step by step through the web interface andprovided their feedback Because of their insights weslightly modified the interface (eg through rephras-ing user instructions) to increase the chances of suc-cessful completion In case the SKOSRec engine is ap-plied in a live setting the design of the user interfacewill have to be tested with more users Our experimen-tal series focused on algorithmic performance whichwas tested with numerous participantsSubjects were recruited through the clickworkercomplatform8 Participants received a compensation fee

6httptomcatapacheorg7httpwwworaclecomtechnetworkjavajavaeeservletindexhtml8httpswwwclickworkerde

of 100e or 150e for a completed survey 103 par-ticipants took part in the study on travel destinationsearch In the multimedia domains in total 154 sub-jects evaluated the SKOSRec engine For each partof the evaluation it was ensured that participants hadnot just run through the evaluation screen Wheneverthere appeared to be irregularities in the log files (egthe default settings of the screen were not changed)these assessments were excluded from further analy-sis The survey template started with a consent formIt informed participants about the context and the pro-cedure of the study After having filled in the consentform participants were navigated to the first part of theexperiment It collected data on subjectsrsquo demograph-ics and their consumption and search habits After thepreparation phase participants worked on the first testcase (TC1) in each usage scenario In this test case theperformance of the SKOSRec engine was assessed inthe simple execution mode of on-the-fly recommenda-tions (see Subsect 32) It was done to gather data onthe usefulness of suggestions resulting from a simplecontent-based approach (ie baseline method) withwhich more advanced recommendation strategies werecompared at a later stage In cases where users did notstate any items the application showed an error pageWhile setting up a profile required some effort on theside of the user it was a necessary step for the experi-ment Since we did not have access to the log files of alive application profile generation had to be simulatedThe process was facilitated by a specifically tailoredinterface (see Fig 2) where subjects could search foritems through an autocomplete field that applied AJAXtechnology9

It enabled users to type in keywords (eg the nameof a music act or the author of a book) for whichmatchings were instantaneously displayed withoutreloading the site Hence participants could quicklyfind and select their items of interest Whenever auser entered a query through the AJAX interfacethe system matched the query keywords against anApache SolrLucene search index10 The index wasbuilt with item metadata which had been extractedfrom DBpedia prior to setting up the website Theindex comprised the names and abstracts of itemsWhenever available information on thumbnail pic-tures were saved in the index to be ready for display inthe AJAX interface In case the Apache SolrLucene

9httpapijquerycomjqueryajax10httpluceneapacheorgsolr

14 L Wenige et al Similarity-based Graph Queries

Fig 2 User profile generation TC1 (music domain)

search engine found index keywords that matched theuser query metadata information for these items wereimmediately shown on the screen This informationwas displayed to help participants make an informeddecision on whether the AJAX result list containedsuitable items for profile generationAside from the AJAX feature the appeal of the in-terface was increased by a progress bar at the top ofthe webpage (see Fig 2) Progress bars are commonlyutilized tools for web surveys In conventional paper-based studies subjects usually see how far ahead theyare while in a web survey they cannot check theirprogress immediately This is especially true for stud-ies that apply a screen-by-screen navigation whichwas the chosen approach for the conducted web ex-periments Progress bars prevent users from tiring ofthe questionnaire and withdrawing from the study alto-gether by showing how close participants are to com-plete the experiment [14]Upon profile generation the SKOSRec engine calcu-lated on-the-fly recommendations for a simple query(similar to Q1 Sect 3) which were shown to usersin a separate screen Suggestions were displayed witha sufficient amount of information eg thumbnailslabels or a short description to help participants as-sess the quality of the recommendation In addition tothis information users could also access the respectiveWikipedia article by following the hyperlink on the la-bel Relevance sliders were used to assess the utility ofeach recommendation They are often applied in eval-uations of IR systems [51 54] Scores were handled aspoints on a 0 to 100 scale of the relevance slider (outerleft side lowest relevance outer right side highest rel-evance see Fig 3)

This approach facilitated the calculation of an accu-rate precision score and the application of tests withhigher statistical power than an ordinal 5-star scaleThe score called mean relevance (mrs) was measuredwith Equation 19

mrs =

ksumi=1

scorei

k(19)

In addition to the precision score the web applica-tion also determined the size of the recommendationlist produced by the engine as an indicator for recallHowever recommendation list size can only be uti-lized for this purpose when mrs scores reach a reason-able levelWhile accurate recommendations are useful becausethey build trust in a system [57] they only capture anarrow aspect of performance [36] For instance anonline retailer can tremendously profit from an RSthat provides both relevant and unobvious recommen-dations since it enables users to explore the prod-uct catalog more thoroughly On the other hand inthe case of an RS only suggesting popular productsit is likely that the user already knows them [3 23]A pure accuracy-based evaluation disregards these as-pects which is why other performance indicators suchas novelty were measured as well Thus the evalua-tion interface also contained a section where subjectshad to tick a radio button that indicated whether anitem was new In the backend of the web applicationnovelty was measured as the ratio of relevant items not

L Wenige et al Similarity-based Graph Queries 15

Fig 3 Recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

familiar to the user (|nov|) among the total numberof relevant items (|tp|) (Eq 20)

nv =|nov||tp|

(20)

An item was considered relevant when it achievedan mrs score of 50 or higher In addition to novelty anddiversity the web application also tracked the diversityof recommendation lists since a topically diversifiedresult set has been found to increase overall user satis-faction [31 66] In the RS literature the most widelyexplored diversification metrics are the ones that mea-

sure the similarity between each item pair in the rec-ommendation list The more the items are alike theless diverse is the result set [19] The study series ofthis paper applied the metric by Castells et al [8] Itdetermines item-to-item similarity values with the co-sine similarity measure Scores are then converted todistance values (Eq 21) and averaged for the recom-mendation list (Eq 22)

d(il im) = 1minus cos(il im) (21)

16 L Wenige et al Similarity-based Graph Queries

divC =2

k(k minus 1)

sumlltm

d(il im) (22)

The content-based diversity scores (divC) were cal-culated in the backend of the web application andsaved in a log file for each generated recommenda-tion list of the SKOSRec engine during runtime execu-tion In addition to the measurement of divC scores weasked participants about their opinions on recommen-dation list diversity directly Following the RS evalua-tion framework by Pu et al [44] two Likert items wereposed for this purpose

ndash The recommendations of list [] are diversendash The recommendations of list [] are similar to

each other

Besides the above listed algorithmic performancemetrics the authors gathered assessments from usersregarding the overall usefulness of the suggestionsThis approach is also in line with the empirically veri-fied evaluation framework by Pu et al [44] which pro-poses to measure the psychometric construct of Per-ceived Usefulness to get a comprehensive understand-ing of user opinions on recommendation results Thusparticipants were instructed to state their agreement topositive statements regarding the relevance of the sug-gestions The statements were adapated to the speci-ficities of each usage scenario Fig 4 shows an exam-ple collection of Likert questions for the domain ofmusic RS

After participants had provided the required infor-mation their answers were saved in a log file and theywere navigated to test case 2 (TC2) which evaluatedthe SKOSRec enginersquos performance for constraint-based queries (similar to Q3 Sect 3) We tried to findout if the engine was able to improve user satisfactionthrough filter conditions Even though IR systems al-ready successfully apply facet filters on result sets theexperiments still had to prove whether LOD-enabledconstraints have the same effect on the Perceived Use-fulness of recommendations For the comparisons ofnon-filtered and filtered suggestions we applied awithin-subjects design in which participants had torate two recommendation lists resulting from regular(non-filtered) and filtered requests in TC2 We favoredthis setup over a between-subjects design where sub-jects would have been assigned to only one result listWhile the between-subjects design more closely re-sembles a real-world setting since it does not require

participants to interact with more than one approachat a time differences among subjects (eg regard-ing demographics or topic expertise) can potentiallyhave a considerable impact on quality judgments [19]Hence when applying a between-subjects design onedoes not know whether differences in performance arecaused by the approaches or by the variability amongparticipants Therefore researchers need to conductbetween-subjects experiments with a sufficient numberof study subjects to level out these differences Within-subjects studies on the other hand require fewer par-ticipants while still achieving the same level of sta-tistical power since between-subject variability is notan issue [30 31] Beel et al have shown that evensmall variations in an RS test setting could cause sub-stantial differences in performance outcomes [6] Tokeep the extent of variations even lower comparisonsbetween non-filter and constraint-based recommenda-tions were based on the same profile for each user inTC2 Therefore the user profile from TC1 served as astarting point for filtered retrieval The web applicationshowed the recently generated profiles to users andasked them to provide an additional filter conditionFig 5 depicts an example web form for a constraint-based recommendation request in the music domain Itcomprises the user profile and an additional text fieldwhere users stated their constraint

Afterwards the engine applied this filter conditionon the set of potential recommendation results be-fore similarity calculation started For each usage sce-nario different types of constraints were available Re-garding suitable filter options the author sought toachieve a balance between assisting users in finding fil-ter conditions that would adequately represent their in-formation needs while keeping the variability amongusers as low as possible Therefore the specificitiesof each scenario and the availability of the respectivedata sources were determined For instance for themultimedia domains (movie music and books) it wasassumed that users would like to filter recommenda-tions according to the specific genre of the item Fortu-nately the DBpedia dataset contains this kind of infor-mation for the music domain In this domain the genrecan be retrieved through the properties dbogenreHowever in the movie and book domains DBpediadoes not provide sufficient information Many LOD re-sources which represent books were not assigned agenre In the movie domain a genre property is miss-ing entirely and genre-specific information can of-ten only be found in the subject categories describ-ing the movie Hence in contrast to the experiment on

L Wenige et al Similarity-based Graph Queries 17

Fig 4 Evaluation screen for recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

music recommendations participants in the domainsof books and films did not filter their recommenda-tion lists by genre On the other hand the propertydcsubject was applied as recommendation filter ineach multimedia domain Since the subject propertyoften links to many informative features of multimediaitems (eg release period geographic information orcontent information) the author decided that this fea-ture represents a powerful filter dimensionIn the experiment on travel destination search the webinterface offered two filters a subject and a location-specific filter The subject filter enabled users to statetheir preferences regarding the characteristics of a lo-cation (eg being a nature reserve) The location-specific filter on the other hand gave participants theoption to specify where the desired destination shouldbe located11

The second fundamental research issue of TC2 con-cerned the question whether expressive constraints canboost recommendation quality even further Hence

11httpdbpediaorg

apart from executing a constraint-based workflow witha simple filter (ie a direct attribute of an item)the SKOSRec engine generated suggestions with thehelp of an expressive filter as well This second pro-cedure while still applying the same user constraintas the simple filter utilized an expressive graph pat-tern to include additional LOD resources in the resultset For instance in case a subject filter was selectedresult set expansion was facilitated by exploring thewider semantic space of the DBpedia category graphthrough a property path declaration in the graph pat-tern (eg skosbroader[2]) In the travel exper-iment an additional expressive filter retrieved place-related information (dulhasLocation) for location-specific subproperty relations that are not material-ized in DBpedia Hence the expressive graph pat-tern expanded the filter such that additional proper-ties were matched (eg dbocountry dbocity ordbostate) while still applying the specified userconstraint on the set of LOD resources Upon exe-cution of constraint-based recommendation requestsusers received two recommendation lists (resultingfrom simple and expressive filtering accordingly) in a

18 L Wenige et al Similarity-based Graph Queries

Fig 5 Web form for a contraint-based recommendation request (music domain)

separate evaluation screen Participants evaluated re-sult sets in the same manner as in TC1 For each resultset the screen contained an additional Likert item ask-ing subjects to give their agreement to the statementldquoThe filter has improved the recommendation resultsof listrdquo The sequence of suggestions resulting fromthe different methods was randomly assigned in eachuser session to avoid order effectsAfter study subjects had completed TC2 they werenavigated to test case 3 (TC3) In the TC3 section ofthe travel experiment participants could choose be-tween three options for rollup query patterns (simi-lar to Q6 Sect 3) Users were able to obtain recom-mendations for travel destinations based on POIs theyhad visited during the stay in another destination Theycould either select a city (option 1) a region (option 2)or a country (option 3) as destination type Upon en-tity type selection the SKOSRec engine generated ap-propriate suggestions Fig 6 shows the web form fromthe travel experiment that facilitated the formulation ofadvanced rollup requests in TC3

When users had selected an entity type stated theirfavorite travel destination and three belonging POIs

the application generated a SKOSRec query from theseparameters A simple on-the-fly request was sent tothe engine as well to retrieve baseline recommenda-tions with which the suggestions resulting from theadvanced request were compared at a later stage ofthe experiment The simple query only contained theuserrsquos favorite travel destination and the selected en-tity type option thus omitting information on the POIsDuring the travel experiment the engine processedboth the simple and the advanced query and producedtwo recommendation lists As in the previous parts ofthe experiment participants assessed the quality of theresult sets which appeared in random order on thescreenThe multimedia experiments also contained a sec-tion on rollup retrieval (TC3) As the advanced travelqueries the multimedia requests aggregated similarityscores of sublevel entities through summation whichthe engine joined with a postfilter In addition to thesefeatures the pattern contained a preference query partThis section identified a set of preferred items througha single user statement Participants either specifiedtheir favorite directoractor (music domain) music act

L Wenige et al Similarity-based Graph Queries 19

Fig 6 Travel - User profile generation TC3

(music domain) or author (book domain) and receivedrecommendations based on the features of the workscreated by the artist (similar to Q5 Sect 3) TheSKOSRec engine generated recommendations basedon the creative works of the favored artist Partici-pants also received suggestions from a simple on-the-fly query which computed results according to theartistrsquos characteristics As in the travel experiment rec-ommendations from the advanced rollup request werecompared with the on-the-fly query without letting par-ticipants know which approach they were assessingand through random assignment of result set positionsAfter participants of the multimedia experiments hadevaluated the results of TC3 they were guided to testcase 4 (TC4) which evaluated cross-domain queriesThe experiment on travel RS ended with TC3 becauseno suitable graph patterns could be identified to fa-cilitate these kinds of suggestions for the travel usagescenario In the multimedia domain however the datafrom DBpedia was sufficient for this retrieval task Theentity type of the study (ie movie music act or book)defined the source domain based on which the enginegenerated suggestions for items from another multime-dia target domain (similar to Q7 Sect 3) Fig 7 de-picts the cross-domain web form of the music experi-ment

The web form assisted subjects in formulating a re-quest which then obtained movie suggestions basedon the favorite music act of the user Parameters fromusers were entered in the placeholders of the querypattern in the backend of the application We formu-lated the query templates in such a way that the queryengine matched any properties or graph patterns thatcould potentially connect two items from the specified

target and source domains For instance such match-ings can occur when a user states his favorite moviethat is written by a particular author In case the sameauthor has also written some books they might be ofinterest to the user as well Another example stemsfrom the movie domain Suppose in hisher profile aconsumer has declared a preference for a movie thatlinks to the corresponding soundtrack It might wellbe the case that the user likes the soundtrack and themusic acts that were involved in creating it just asmuch as she likes the movie The practice to distin-guish homonymous LOD resources through the prop-erty dbowikiPageDisambiguates is another in-teresting linking pattern from DBpedia that was ex-ploited for cross-domain requests in TC4 since it con-nects similar entities (ie the book edition of a certainmovie)Cross-domain SPARQL queries (similar to Q8 Sect3) were posed as baseline requests in this test caseWhile the SPARQL query only matched cross-domainrelations for a single LOD resource the SKOSRec re-quest expanded the search space by considering linksfrom more than one item As in the previous test casesTC4 ended with an evaluation screen where users as-sessed the SPARQL-based as well as the SKOSRecsuggestions in random order without knowing whichrecommendation list belonged to which method Table10 gives an overview of the test cases and the meth-ods that were applied in the experiments in each usagescenario

42 Results

In total 257 people took part in the studyThe gath-ered samples from the experiments contained answers

20 L Wenige et al Similarity-based Graph Queries

Fig 7 Music - User profile generation TC4 (page 9)

Table 10Test Cases in the web experiments

Test Case Evaluation Purpose Query Type Travel Movie Music BookTC1 On-the-fly Recommendations (Baseline) Q1 X X X X

TC2 Constraint-based Recommendations Q3 X X X X

TC3 Advanced Queries (Rollup) Q5 Q6 X X X X

TC4 Advanced Queries (Cross-Domain) Q7 x X X X

from both genders with a slight overrepresentation ofmale participants (537) The prevailing number ofstudy subjects was between 20 and 40 years of age(658) The evaluation of the demographics sectionalso revealed that the majority of participants alreadyperceives the search in their domain of interest as ei-ther ldquomanageablerdquo (385) or even as ldquoeasyrdquo or ldquoveryeasyrdquo (440)Prior to analyzing the performance results of the novelretrieval approaches of the SKOSRec engine it waschecked whether the respective Likert items for Per-ceived Usefulness and user-based diversity (divU)reached acceptable reliability levels For this purposeCronbachrsquos α values were determined Table 11 showsthe outcome of this analysis for the Perceived Useful-ness construct Given the acceptable values through-out the domains (Cronbachsrsquo s α gt 07) it could betreated as an interval-scaled variable in the statisticaltests

Table 11Cronbachrsquos α values for the concept of Perceived Usefulness

Domain Cronbach αTravel 07002

Movie 08579

Music 08515

Book 08949

Besides usefulness user-based diversity scores werealso measured by multiple Likert items However

Cronbachrsquos α values for this concept did not reach ac-ceptable levels In order to avoid that valuable user as-sessments remained unused agreements to the state-ment bdquoThe recommendations of list [] are diverse ldquowere taken into account as an ordinal variable (divU)Based on these results subsequent tests could be car-ried out The first part of the analysis concerned theperformance of the on-the-fly retrieval method (base-line) of the SKOSRec engine in TC1 The outcomeof this analysis confirms that the on-the-fly approachrepresents a viable recommendation strategy Table 12shows the mean (M)12 standard deviation (S D) andnumber of participants (N) for each performance di-mension and metric It can be seen that users assessedthe enginersquos performance in almost each quality di-mension to be above mediocrity Aside from the posi-tive user evaluations with regard to Perceived Uselful-ness related quality aspects such as Accuracy Noveltyand Diversity received good assessments as well Forinstance each domain mean relevance score (mrs) washigher than 50 score points It means that participantswere often convinced of the relevance of the resultsGiven these positive user assessments with regard toitem relevance it seems to be justified to consider themean of result list sizes as an indicator of recall Theengine was able to generate the required number ofrecommendations in almost each test case (see Table

12In case of the user-based diversity score (divU) M denotes themedian because of the ordinal scale of the variable

L Wenige et al Similarity-based Graph Queries 21

12) The on-the-fly retrieval approach was also testedwith regard to the Novelty dimension Table 12 showsthe mean novelty scores (nv) for the baseline recom-mendation approach Novelty scores indicate that themajority of items in a result list were deemed to be newin the travel and book RS whereas in the experimentson movie and music RS only one third of the recom-mendation list was unfamiliar to users In contrast tothat content-based Diversity scores (divC) were ratherhigh in each domain Hence it can be concluded thatitems in a result set usually have a low rate of match-ing SKOS annotations In addition to content-basedscores user assessments of recommendation list Di-versity (divU) were also taken into account The me-dian level of agreement to the Diversity-related Lik-ert item stating that recommendations are diverse liedat 3 (ldquoneutralrdquo) in the multimedia domains and at 4(ldquoagreerdquo) in the travel domainIn summary the baseline approach achieved fairlygood performance results in TC1 The findings demon-strate that LOD-enabled recommendations can po-tentially enhance a userrsquos search experience Theyalso allow for subsequent tests of advanced retrievaltechniques which can be compared to this standardmethod

In the analysis of performance results from TC2(constraint-based recommendations) particular atten-tion was paid to participantsrsquo agreements with thestatement ldquoThe filter has improved the recommenda-tion results of the listrdquo Fig 8 shows the distribution ofdomain-wise agreement ratios On average they werefairly high Throughout the domains the vast majorityof participants either selected 5 (ldquostrongly agreerdquo) 4(ldquoagreerdquo) or 3 (ldquoneutralrdquo) Hence a filter often had apositive impact on results

The authors gathered responses to the questionon improvement for recommendation lists resultingfrom both the simple and the expressive filtering ap-proaches Hence they give clues about the generalperformance of constraint-based retrieval Howeversince this section of the study revealed the experimen-tal condition to participants (ie application of a fil-ter) answers to the Likert statement have to be inter-preted cautiously Results may be slightly biased asparticipants were able to guess the underlying agendaHowever these limitations do not exist for compar-isons between simple and expressive filters Since theweb application randomly assigned the list order par-ticipants did not know which approach they were as-sessing This enables us to take a closer look at the dif-ferences between the two filtering approaches It was

Table 12Performance results in TC1 (M = Mean S D = Standard DeviationN = No of participants)

Dimension (Metric)[Max Score]

Domain Results

M SD N

Accuracy (size) [10]

Travel 1000 000 103Movie 1000 000 50Music 1000 000 53Book 980 140 51

Accuracy (mrs) [1]

Travel 5649 000 103Movie 6225 1915 50Music 5570 010 53Book 5747 1555 50

Novelty (nv) [1]

Travel 060 034 101Movie 036 031 50Music 036 039 52Book 061 029 47

Diversity (divU) [5]

Travel 4 - 102Movie 3 - 50Music 3 - 53Book 3 - 50

Diversity (divC) [1]

Travel 080 016 103Movie 087 012 50Music 086 010 50Book 092 008 50

Usefulness [5]

Travel 334 085 103Movie 355 073 50Music 318 085 53Book 343 081 50

hypothesized that advanced filters improve recommen-dation quality as they may identify more relevant re-sources thereby overcoming problems of data qualityor data sparsity in LOD repositories Table 13 lists re-sponse rates of the two methods The rate measuresthe ratio of non-zero result sets among all result setsThroughout the domains recommendation lists result-ing from expressive filtering achieved higher responserates

Table 13Response Rates (TC2)

Domain Constr-based(Regular)

Constr-based(Expanded)

N

Travel 23 57 103

Movie 86 88 50

Music 72 75 53

Book 73 76 51

22 L Wenige et al Similarity-based Graph Queries

000

025

050

075

100

Digital Library Travel Movie Music Book

Domain

Per

cent

age

of A

gree

men

t

Degree of Agreement

1

2

3

4

5

The filter has improved the recommendations of the list

Fig 8 Participantsrsquo agreement with the statement that the filter has improved the result list (domain-wise) (TC2)

In addition to response rates user assessments andscores in the defined quality dimensions were com-pared Table 14 depicts the outcome of this analysisSignificantly higher scores are marked in bold figuresIn case statistical tests identified no meaningful varia-tions the results of the better performing approach areunderlined A-levels were adjusted with the false dis-covery rate (FDR) in order to decrease the probabilityof making a type I error due to multiple testing

According to mean scores expressive requestsgenerated more comprehensive recommendation lists(size) This finding is in line with the increased re-sponse rates shown in Table 13 However subse-quently conducted t-tests confirmed significant dif-ferences only for the travel experiment (t(102) =minus789 p lt 0001) In contrast precision scores werehigher in most of the domains (travel movie mu-sic) when the SKOSRec engine processed a simpleconstraint-based query However statistical tests didnot confirm any systematic differences The authorsapplied Wilcoxon tests for mrs scores and the remain-ing performance metrics because of the small samplesizes resulting from the low response rates of the sim-ple constraint-based approachRegarding Novelty none of the approaches was supe-rior In the Diversity dimension expressive filters gen-erated suggestions that were at least as topically di-versified as the results from simple constraint-basedretrieval Even though no significant differences were

identified between the two approaches increased meanscores (divU in the travel domain and divC in each do-main) indicate a slight superiority of expanded filterrequests in this quality dimension The same appliesto usefulness scores (mean values were higher in 3 outof 4 domains when expressive filtering was applied)However these statements are speculative at best be-cause statistical tests were not significantIn summary it can be concluded that expressive fil-ters improve recall potentially leading to a slight lossin precision scores (see Table 14) It may also be thecase that expanded user constraints diversify as wellas increase the usefulness of recommendation listsHowever these claims are unverified In contrast itis safe to say that expressive query patterns generateresults of at least the same quality as simple patternswhile increasing the number of relevant suggestionsThe few differences may be explained by the high Jac-card scores (JI) (see Table 15) of recommendation listsThroughout the domains result sets were much alikeGiven these similarities and the increases in list sizesfor expressive filters it is supposed that an expandedconstraint produces an enhanced version of the recom-mendation list resulting from simple filtering There-fore expanded filters should be the default setting in aconstraint-based retrieval context

In TC3 (ie evaluation of roll-up query patterns)the SKOSRec engine almost always generated non-empty recommendation lists (see Table 16) In the

L Wenige et al Similarity-based Graph Queries 23

Table 14Performance results in TC2 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Simple Expressive N

M SD M SD

Accuracy (size) [10]

Travel 117 270 452 462 103Movie 672 408 720 392 50Music 596 464 636 451 53Book 459 450 500 451 51

Accuracy (mrs) [1]

Travel 6673 2051 6534 2236 23Movie 6095 2298 6029 2279 43Music 6377 1605 6296 1644 34Book 5617 1818 5681 1681 37

Novelty (nv) [1]

Travel 048 041 061 049 21Movie 044 036 055 080 42Music 055 039 052 037 38Book 077 034 076 033 33

Diversity (divU) [5]

Travel 3 - 35 - 24Movie 4 - 4 - 43Music 4 - 4 - 38Book 3 - 3 - 37

Diversity (divC) [1]

Travel 068 018 072 021 19Movie 070 024 073 023 41Music 086 013 087 012 34Book 084 024 095 063 29

Usefulness [5]

Travel 333 093 350 077 24Movie 332 085 331 086 43Music 355 081 364 078 38Book 339 083 362 121 37

Table 15Jaccard Indices (TC2)

Domain Mean Jaccard Index (JI) SD N

Travel 057 047 25

Movie 071 038 43

Music 082 035 37

Book 092 026 35

travel and the book experiment the response rate washigher for aggregation-based queries whereas in themovie domain the engine more often provided recom-mendations for regular requests However these dif-ferences are only marginal and do not indicate a clearsuperiority of one method

Fig 9 depicts participants general agreement to Lik-ert statements concerning the Perceived Usefulness ofthe approaches in TC3 The diagram shows similarresults for regular as well as advanced requests Onaverage satisfaction scores reached levels which layslightly above neutral agreement A subsequent t-test

Table 16Response Rates (TC3)

Domain Regular Aggr-based N

Travel 99 100 103

Movie 96 92 50

Music 100 100 53

Book 96 100 51

confirmed no significant differences between the twomethods In fact the approach with the highest aver-age score of Perceived Usefulness was different in eachexperiment While mean values were higher for regu-lar requests in the studies on travel and music RS ad-vanced queries produced higher scores in the movieand book experiments

Content-based Diversity (divC) was the only met-ric with systematic differences Here scores went upfor advanced retrieval patterns in the travel (t(98) =minus550 p lt 0001) the music (t(52) = minus451 p lt

0001) and the book domain (t(47) = minus490 p lt

24 L Wenige et al Similarity-based Graph Queries

0

1

2

3

Travel Movie Music Book

Domain

Agr

eem

ent

Method

Regular

Rollminusup

Perceived Usefulness (Regular vs Rollminusup Queries)

Fig 9 Participantrsquos overall satisfaction with regular and roll-up recommendation retrieval (TC3)

0001) (see Table 17) This finding is not surprisinggiven that the aggregation-based approach explores thewider semantic network of each item in the profile

Hence it can be assumed that both approaches pro-duce recommendation lists of similar quality and canprovide benefits to potential consumers Thus it can-not be clearly stated which approach is superior be-cause neither of them outperformed the other in eachdomain While the regular retrieval strategy was bet-ter in terms of usefulness in the travel and the mu-sic domains the aggregation-based approach achievedhigher scores in the domains of movie and book rec-ommendationsThis conclusion is especially impressive given thelow mean Jaccard scores for recommendation lists re-sulting from regular and aggregation-based retrieval(see Table 18) The mean score never exceeded 01

throughout the domains It indicates a low concor-dance between result sets It is remarkable that par-ticipants perceived the recommendations as equallygood given the high dissimilarity of the lists Thusadvanced queries have an added-value as they provideadditional interesting results They can help to exploreLOD repositories in different ways than a regular re-trieval approach Therefore it is worthwhile to offeraggregation-based query patterns as an alternative re-trieval strategy when the regular approach has not pro-duced any helpful recommendations

In addition to TC3 user assessments for TC4 (cross-domain recommendations) were also evaluated How-ever since the authors conducted experiments for TC4only in the multimedia domains the samples were con-siderably smaller Nevertheless it is assumed that theamount of completed user sessions is only just enoughto conduct further statistical analyses Table 19 showsthe response rates for the two retrieval methods Cross-domain patterns had a higher success rate of generat-ing non-zero result sets throughout the domains Par-ticipants almost always received a recommendation forthis query type whereas in case a regular SPARQLquery was executed often they did not receive a singlesuggestion at all

We applied non-parametric tests for the perfor-mance metrics mrs nv divU divC and PerceivedUsefulness due to the low response rates of theSPARQL-based approach which led to a low num-ber of comparable data points accordingly Overall theFDR-adjusted pairwise Wilcoxon tests detected no sig-nificant differences except for user-based Diversityscores (divU) in the movie (Z = minus269 p lt 005) andthe book domain (Z = minus256 p lt 005) The resultsfor the content-based Diversity (divC) metric seem topoint in the same direction because of the high meanvalues for cross-domain queries in each domain Butthe significance of these results was not verified Themrs scores indicate a better performance of regular

L Wenige et al Similarity-based Graph Queries 25

Table 17Performance results in TC3 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular Aggr-based N

M SD M SD

Accuracy (size) [10]

Travel 990 099 942 207 103Movie 288 059 276 082 50Music 1000 000 1000 000 53Book 288 059 300 000 51

Accuracy (mrs) [1]

Travel 4821 2498 4493 2559 102Movie 5458 2288 5669 2542 44Music 5395 1831 5824 1619 52Book 5153 2390 4925 2230 49

Novelty (nv) [1]

Travel 043 039 038 040 98Movie 035 038 028 041 41Music 049 036 046 039 50Book 061 042 062 044 47

Diversity (divU) [5]

Travel 35 - 3 - 100Movie 4 - 3 - 43Music 4 - 3 - 50Book 3 - 3 - 48

Diversity (divC) [1]

Travel 079 017 089 013 99Movie 075 020 066 027 44Music 083 017 093 007 53Book 074 027 093 006 48

Usefulness [5]

Travel 336 077 322 074 101Movie 320 092 326 094 44Music 345 084 344 085 51Book 295 087 303 096 48

Table 18Jaccard Indices (TC3)

Domain Mean Jaccard Index (JI) SD N

Travel 007 011 101

Movie 003 007 44

Music 002 005 53

Book 002 006 50

Table 19Response rates (TC4)

Domain Regular (SPARQL) SKOSRec N

Movie 32 96 50

Music 62 98 53

Book 53 100 51

SPARQL queries but the within-subjects test did notconfirm this assumption statistically (see Table 20)

However the most important finding of the anal-ysis was that SKOSRec cross-domain queries gener-ated significantly more suggestions throughout the do-

mains (movie t(49) = minus1792 p lt 0001 mu-sic t(52) = minus2573 p lt 0001 book t(50) =minus1275 p lt 0001) (see Table 20) Fig 10 illustratesthis graphically

Whenever the SPARQL request provided recom-mendations which was the case in less than half ofthe user sessions (see Table 19) the SKOSRec sug-gestions were assessed to be of almost equal qualityas the SPARQL recommendations (see Table 20 Use-fulness) Hence when SPARQL querying did not pro-duce any results in more than the other half of the testcases the SKOSRec approach may still have provideduseful suggestions

Therefore it is reasonable to assume that the capa-bility of the SKOSRec engine to flexibly switch be-tween processing steps of similarity calculation andpattern matching can facilitate an improved explo-ration of LOD repositories Table 21 also shows thatthe items in the two sets often did not match as isindicated by low Jaccard indices This finding furtherdemonstrates that the execution of an additional step

26 L Wenige et al Similarity-based Graph Queries

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

00

25

50

75

100

Movie Music Book

Domain

No

of i

tem

s

Method

Regular

CrossminusDomain

Recommendation List Size

Fig 10 Mean recommendation list sizes for SPARQL and SKOSRec cross-domain requests (TC4)

L Wenige et al Similarity-based Graph Queries 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

of similarity calculation can help to retrieve other rel-evant items

5 Future Work amp Conclusion

This paper has presented the query-based SKOS-Rec engine that facilitates new types of recommen-dation requests for LOD repositories The evaluationshave shown that the SKOSRec system can often gener-ate relevant and useful suggestions when certain querytemplates are utilized In many cases the applicationof expressive or advanced query patterns (ie in thecontext of constraint-based retrieval or cross-domainrequests) helped to significantly improve recall valueswhile still providing the same level of quality in theremaining performance dimensions Additionally ad-vanced recommendation requests can be used to gener-ate diversified recommendation lists in case users arenot satisfied with the results of regular recommenda-tion requests (ie roll-up or cross-domain retrieval vsregular requests) While not being superior in each do-main and test case is at least safe to say that the novelrecommendation approaches of the SKOSRec engineconsiderably extend common retrieval methods and atleast provide an alternative search strategy when con-ventional methods fail to provide useful results Addi-tionally the increase in diversity and recall is in linewith findings from previous research on LOD-enabledRS [45]It is a strength of the developed approaches that theyfacilitate combinations of graph-based and similarity-based retrieval at different stages of the recommenda-tion workflow This feature extends general search ca-pabilities for semantic networks It requires future re-search to investigate whether in addition to SKOS fur-ther RDF vocabularies can be used for ad-hoc item-to-item similarity computation to increase the range ofpotential usage scenarios since not every LOD datasetcontains SKOS annotations [53] The application ofadditional properties could also boost the recommen-dation quality of some query patterns (eg such as thequery types used in TC3) In this context it also needsto be determined whether the representation of the user

profile in the SKOSRec query language can be substi-tuted with free-text expressions to enable search-likefunctionalities LOD researchers have already devel-oped engines that can process natural language queriesover RDF data [59] [34] [67] It will have to be inves-tigated how the SKOSRec engine profitably fits intothis landscape of existing retrieval toolsAnother open research question concerns the aspect ofinterfaces to other applications In its standard config-uration the system is not an end-user retrieval enginebut a backend application with which an administra-tor who has domain knowledge (eg of RDF vocab-ularies and data models of a particular LOD reposi-tory and usage scenario) creates the respective querypatterns Although the websites of the online experi-ments represent possible implementations of suitableend-user applications for the SKOSRec engine thereare still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selec-tion of LOD items for the profile was made possibleby previously storing the data in an index that couldthen be accessed and searched before issuing a recom-mendation query However this runs against the ideaof ad-hoc retrieval In this context future research willhave to determine whether it is feasible to generate anon-the-fly mapping from items to the respective LODresources In the case of a commercial application theengine needs to match items with the correspondingentries in the product catalog Even if it is not yet clearwhich UI components can make the best possible useof the SKOSRec enginersquos features the results from theuser experiments can give at least a few suggestionsThe following retrieval options should be the defaultsetting in an interface that contains a LOD-enabledRS component to assist users in finding interestingitems Since the regular approach of on-the-fly re-trieval achieved good results throughout the domainsthe display of simple recommendations generated frompreviously stated user preferences represents a suitablestarting point for retrieval On-the-fly queries could berefined by providing filter options Since the evalua-tions have shown that expressive constraints often leadto increased recall values the application of such afilter might be the best option for assisted retrievalAs in the preparations for the experiments appro-priate graph-based query patterns for the usage sce-nario in question would also have to be determinedby a domain expert before setting up the applicationThe same applies to advanced retrieval patterns whichshould be available to the end user to refine the querywhen both the simple as well as the constraint-based

28 L Wenige et al Similarity-based Graph Queries

approach have failed to provide relevant items Thissuggestion is made because roll-up queries were com-petitive with regular recommendations in the web-based experiments Therefore they represent a viablealternative retrieval strategy For some domains (egfor multimedia retrieval) it is also possible to usean optional selection field to enable the formulationof cross-domain requests Evaluations for this querytype suggest that users might receive interesting results(ie diversified and comprehensive recommendationlists) from such a requestIn summary the contributions of the paper are as fol-lows

ndash Development and implementation of novel re-trieval strategies in the SKOSRec system whichfacilitate personalized requests and leverage thepotential of knowledge graphs in order to addressthe research gap in state-of-the-art recommenda-tion methods with regard to customized queries

ndash Realization of web-based user experiments infour usage scenarios with the following out-comes

bull Successful demonstration that knowledge graphdata is applicable in the context of live recom-mendation scenarios

bull Creation of suitable query patterns that canbe applied to represent individual informationneeds in different application contexts The setof query patterns can be extended with the helpof the SKOSRec query language to meet therequirements of additional usage scenarios

bull Proof that the novel retrieval strategies can besuccessfully applied across various applicationcontexts and recommendation tasks and thatthey outperform conventional content-basedretrieval approaches especially in the perfor-mance dimensions of diversity and recall

The prototypical implementation and the evaluationof the SKOSRec engine in different usage scenarioshave demonstrated that the freely available knowledgesources on the LOD cloud can be successfully appliedto advance state-of-the-art methods in IR and recom-mender systems The proposed personalized and cus-tomized retrieval strategies can leverage the potentialof knowledge graphs by better matching informationneeds of users with the help of semantic search op-tions

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be com-prised of up to six elements of which the partsSimProjection and ItemPart are obligatory Henceusers need to state at least their preferencesand how many recommendations they wouldlike to receive Advanced retrieval patterns canbe formulated as well For instance similar re-source retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (Se-lectPart) of LOD resources As in SPARQL thequery can start with a prologue that abbreviatesIRIs with namespace declarations [21]

SelectPart The SelectPart part enables subqueryingwith recommendation results The surroundingquery is a SELECT query with a slightly simpli-fied WHERE clause (ie RecWhereClause) withwhich postfilter conditions can be formulatedIn this section users have the option to formu-late aggregation-based retrieval requests (Aggre-gation)

Aggregation This query part handles aggregation-based postfiltering Users specify the IRI (IRIref )resource for which similarity scores of sublevelentities are summarized Additionally it is statedhow the data should be aggregated (ie SUMMAX or AVG) as well as which variable (Var)represents the aggregation-based recommenda-tions in the postfilter section The compiler makessure that this variable is actually contained in theRecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refersto the list of generated suggestions It definesthe recommendation variable to which all simi-lar LOD resources are mapped In case pre- andpostfilter conditions are set in the other sectionsthe compiler checks whether the variable (Var)appears in these sections as well Otherwise therequired join operations cannot be executed Itis specified how many recommendations shouldbe displayed (Integer) and whether the requestwill be issued against a default repository or as across-repository request (ServiceIntegration)

ServiceIntegration This section indicates that a userintends to receive recommendations from a dif-ferent SPARQL endpoint than the default end-point that is stated in the standard configurationof the SKOSRec engine It specifies the targetSPARQL endpoint (IRIref ) from which a user

L Wenige et al Similarity-based Graph Queries 29

wants to receive recommendations Upon extrac-tion of SKOS annotations from the default sourceendpoint a subsequent request is sent to the spec-ified target endpoint

ItemPart This section of a SKOSRec query repre-sents a single preference in the user profile How-ever recommendation queries can contain a cou-ple of preference statements A preference is ei-ther expressed as a LOD resource (IRIref ) or asa variable which is bound to a preference query(VarPart) In the latter case the resources areobtained by matching the graph pattern in theVarPart with the triple statements in the reposi-tory The ItemPart statement can be additionallysupplemented with a concept-to-concept similar-ity threshold (Sim) that triggers concept expan-sion with the approach of flexible similarity de-tection which is extensively described in one ofour previous works [64]

VarPart The construct initiates preference queryingIndividual likings are expressed as a graph pat-tern in a RecWhereClause The variable (Var) isa placeholder for the LOD resources that will beused to set up the user profile The compiler hasto make sure that the specified variable occurs inthe RecWhereClause as well Otherwise the ex-traction of LOD resources cannot be carried outcorrectly

Sim The Sim part of a SKOSRec query denotes thethreshold of concept-to-concept scores for flexi-ble similarity detection Even though knowledge-based similarity values usually range between 0and 1 the specification allows a broader range ofdigits in the Decimal datatype to enable applica-tion of other similarity metrics in case they areneeded

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users canstate whether scores should be ldquolargerrdquo ldquolargerthanrdquo or ldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three dif-ferent sections of a SKOSRec query ie either inthe prefilter (ItemPart) the postfilter (SelectPart)or the preference filter section (VarPart) TheRecWhereClause closely resembles the ldquoWhere-Clauserdquo of the SPARQL 11 specification withall its subsequent parts [21] Hence it enablesdifferent combinations of graph pattern match-ing However the RecWhereClause of the SKOS-Rec language allows fewer pattern matching ex-pressions than the ldquoWhereClauserdquo of the regular

SPARQL syntax specification [21] For instanceit is neither permitted to formulate subqueries norto direct filter requests to more than one reposi-tory (see RecGraphPatternNotTriples for furtherexplanations) This modification has been madeto simplify similarity calculation After parsingthis section the compiler makes sure that recom-mendation preference or postfilter variables oc-cur at least once in the RecWhereClause to guar-antee that subsequent steps of the recommenda-tion workflow can be applied on existing LOD re-sources In case the RecWhereClause comprises aRecMinusGraphPattern it also has to be checkedthat the variable is not only contained in theMINUS part of the group

RecGroupGraphPattern A RecGroupGraphPatterncan contain one to many basic graph patternswhich can be differently combined according toRecGroupGraphPatternSub

RecGroupGraphPatternSub This section defines thegraph patterns that are applied to identify suit-able LOD resources User filters can be either ex-pressed as basic graph patterns (TriplesBlock) oras combinations of them (RecGraphPatternNot-Triples) The sequence of different kinds of pat-terns can be flexibly specified

RecGraphPatternNotTriples This query part closelyresembles the similar named ldquoGraphPatternNot-Triplesrdquo of the SPARQL 11 specification [21]The only difference to SPARQL is that the Rec-GraphPatternNotTriples section is a little morerestricted in terms of admissible graph patternsthan the ldquoGraphPatternNotTriplesrdquo section of theregular SPARQL specification For instance itdoes not allow to retrieve LOD resources otherthan from the default graph that is specified in theconfiguration of the SKOSRec engine since filterpatterns have to be applied to datasets that containSKOS annotations Hence they need to be speci-fied before runtime execution to ensure that sim-ilarity calculation can be carried out correctly Ifa user was able to formulate filter conditions thatreferred to different RDF graphs and endpointsaccordingly this would not be possible

RecGroupOrUnionGraphPattern This query sec-tion marks an alternative graph pattern in the styleof the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntaxspecification [21] However it neither allowsSPARQL-like subqueries nor federated queries

30 L Wenige et al Similarity-based Graph Queries

for filtered resources because this would compli-cate similarity calculation

RecOptionalGraphPattern In this part of a SKOS-RecQuery users are enabled to specify additionalgraph patterns which might extend the query so-lution but do not necessarily have to match thedata

RecMinusGraphPattern This section handles exclu-sion of LOD resources for cases when users wantto omit certain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

L Wenige et al Similarity-based Graph Queries 31

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM A

mobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige J Ruhland Flexible on-the-fly recommendationsfrom Linked Open Data repositories In International Confer-ence on Business Information Systems pp 43ndash54 2016

[63] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[64] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[65] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[66] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[67] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Future Work amp Conclusion
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 11: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge … · Semantic Web 0 (0) 1 1 IOS Press Similarity-based Knowledge Graph Queries for Recommendation Retrieval Lisa Wenige*,

L Wenige et al Similarity-based Graph Queries 11

picted query over the DBpedia repository the enginedisplays the three cities for whom the highest aggre-gated scores have been detected (see Listing 7)

Table 7Result set of Q6

citydbrOxforddbrAbingdon-on-ThamesdbrWallingford_Oxford

The motivation for formulating such a query is thatwhen searching for cities preferences might be betterrepresented through the sights a user has visited else-where rather than through a simple on-the-fly requestsolely containing a preference statement for a city inthe user profile sectionBesides the above presented roll-up retrieval patternsthe SKOSRec grammar may facilitate the formula-tion of additional types of advanced recommendationqueries For instance cross-domain queries are an in-teresting additional pattern to be explored for personal-ized retrieval Listing 9 shows an example request thatgenerates movie suggestions based on the preferencefor a music band (ie dbrThe_Jackson_5)

Listing 9 SKOSRec query to generate cross-domainrecommendations (Q7)

PREFIX dbr lthttpdbpediaorgresourcegtPREFIX dbo lthttpdbpediaorgontologygtPREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgt

SELECT movie participated musicalActWHERE movie rdftype dboFilm VALUES dbobasedOn dbomusicComposerdbogenre dbostarring musicalAct participated movie UNION movie participated musicalAct LIMIT 10AGG dbrThe_Jackson_5 movie MAXRECOMMEND musicalAct TOP 100PREF dbrThe_Jackson_5WHERE VALUES actType dboMusicalArtist dboBand musicalAct rdftype actType

The query contains a prefilter condition that trig-gers retrieval of music artists and bands Afterwardsthe scores of the top 100 suggestions are summarizedwith maximum-based aggregation Thus the highestsimilarity value among the set of music acts deter-

Table 8Result set of Q7

movie musicalActdbrMoonwalker dbrMichael_Jackson dbrThe_Wiz_(film) dbrAshford_amp_SimpsondbrGarfield_Gets_a_Life dbrThe_TemptationsdbrPipe_Dreams_(1976_films) dbrGladys_Knight_amp_the_PipsdbrThe_Jacksons dbrJermanine_Jackson

mines the ranking score of the film Also note be-cause of the aggregation-based retrieval procedureconnections between movies and the LOD resourcedbrThe_Jackson_5 are excluded from similaritycalculation The scenario exemplifies how semanticrelationships can help to generate suggestions for a tar-get domain (ie movies) that are based on a profilecontaining items from a different source domain (iemusic acts) While regular SPARQL queries performexact matching of triple statements the SKOSRec syn-tax facilitates the integration of similarity- and graph-based retrieval to enrich result lists with other poten-tially relevant itemsAs a point of reference consider the SPARQL queryfor the given cross-domain scenario (Listing 10)

Listing 10 SPARQL query to generate cross-domainrecommendations (Q8)

PREFIX dbr lthttpdbpediaorgresourcegt PREFIX dbo lthttpdbpediaorgontologygt PREFIX rdflthttpwwww3org19990222-rdf-syntax-nsgt

SELECT movie musicalActWHERE movie rdftype dboFilm VALUES dbobasedOn dbomusicComposerdbogenre dbostarring dbrThe_Jackson_5 participated movie UNION movie participated dbrThe_Jackson_5 LIMIT 5

The query omits the similarity calculation partThus it only retrieves movies linked to the resourcedbrThe_Jackson_5 By this means the SPARQLresults have a better chance of being related to the ini-tial user profile On the other hand empty result listscan occur when the preferred LOD resource does nothave any direct connections to recommendable itemsThis assumption is backed up by the result lists of thetwo queries (Tables 8 and 9) The SKOSRec query pro-duces a more comprehensive solution than a regularSPARQL query

12 L Wenige et al Similarity-based Graph Queries

Table 9Result set of Q8

movie musicalActdbrThe_Jacksons dbrThe_Jackson_5

The following of definitions will formalize the post-filtered and aggregation-based retrieval forms Defini-tion 12 specifies the notion of SKOSRec recommenda-tions that may potentially undergo postprocessing

Definition 12 (SKOSRec recommendations) SKOS-Rec recommendations (R(Pr)) are the set of sugges-tions generated by processing the solution mappings(Ω) obtained from recommendation retrieval Theycan be a result of simple on-the-fly retrieval prefilter-ing or a result of a combination of these techniques Prrepresents the input profile that contains at least onepreference for an item The cardinality of R(Pr) is theintended number of recommendations (k) that is spec-ified by the user based on which the engine selects theresources with the highest recommendation scores thatare not contained in the profile (Eq 12)

R(Pr) = x | x isin q | score(Pr q) gt kmaxqisinΩ

score(Pr q) q isin Pr

(12)

The approach of subquerying with recommendationresults builds on the idea that SKOSRec suggestionsR(Pr) can be joined with any graph pattern Ppost pastto the process of similarity calculation (Definition 13)It does not make a difference how the engine obtainedthese recommendations

Definition 13 (Postfiltered recommendation mapping)The postfiltered recommendation mapping is obtainedby joining solution mappings generated from SKOS-Rec recommendations with the results of matching apostfilter graph pattern (Ωpost) (Eqs 13 and 14)

Ωpost f ilter = micro | micro isin [[Ppost]]D x isin dom(micro)

(13)

Ωpost = micro|x | x isin R(Pr) Ωpost f ilter (14)

As the prefilter Ppost can be comprised of triplestatements that are eligible in the RecGroupGraph-

Pattern part of the SKOSRec grammar Aggregation-based queries have to be treated differently than a reg-ular postfilter request Apart from excluding sublevelitems of the profile from the set of similar resourcesthe engine should also omit any connections betweenthe preferred superordinate item (a) and similar sub-level entities Additionally the user has to specify thevariable in his profile (eg y) that represents the itemswithin the postfilter graph pattern Ppost By this meansaggregation-based recommendations can be generated(Definition 14)

Definition 14 (Resources for aggregation-based re-trieval) The set of resources for aggregation-based re-trieval is obtained by retrieving LOD resources thathave a connection to a previously generated sugges-tion thereby excluding all resources that are linked tothe specified superordinate item a in the profile (Eq15)

Ragg(a) = micro(y) | micro isin Ωpost y isin dom(micro) a isin micro(y) (15)

For each potential aggregation-based suggestion (y)similarity scores of connected entities (x) have to beconsidered for the final ranking In this context differ-ent approaches to aggregation can be applied In caseswhen both similar items and aggregation-based recom-mendations are entities on comparable levels of gran-ularity then choosing the maximum similarity scoremight be the best way to represent the relatedness ofthe resource to the user profile On the other handwhen postfiltered recommendations are based on sim-ilarity scores of their sublevel entities (eg as in themovie recommendation example) the scores of relatedsuggestions should be aggregated by the sum or theaverage of individual scores

Definition 15 (Aggregation-based ranking score) Theranking score of an LOD resource y that is connectedto a set of recommended items is determined by aggre-gating scores of connected entities either by the max-imum (Eq 16) the sum (Eq 17) or the average (Eq18) of individual scores

scoreaggMAX(y) = maxxisinmicro(xy)

score(Pr x) (16)

scoreaggS UM(y) =sum

xisinmicro(xy)

score(Pr x) (17)

L Wenige et al Similarity-based Graph Queries 13

scoreaggAVG(y) =

sumxisinmicro(xy) score(Pr x)

|x isin micro(x y)|(18)

4 Evaluation of the SKOS Recommender

41 Experimental Setup

The novel recommendation strategies of the SKOS-Rec engine were evaluated in the context of web-basedexperiments for the usage scenarios of travel destina-tion search and multimedia RS The online approachhelped to reach out to more participants with diversedemographic backgrounds which would not have beenpossible in a laboratory experiment Another positiveside-effect of the web-based setting was that partici-pants could test and rate recommendations without be-ing directly observed by an experimenter which mighthave caused biased results otherwise Hence the on-line approach softened common limitations of labora-tory studies [19]The experiments were carried out in the defined usagescenarios with metadata descriptions from DBpediaThe database containing the local mirror of DBpediaran on a virtual server Besides the triple store a webapplication was hosted on the same machine The web-site ran on an Apache Tomcat 7 application server andwas implemented with the help of Java servlet technol-ogy67

We conducted the online experiments between April2016 and January 2017 Upon setting up the web ap-plication a pretest was carried out to control for po-tential pitfalls such as misleading navigation Two testusers ran step by step through the web interface andprovided their feedback Because of their insights weslightly modified the interface (eg through rephras-ing user instructions) to increase the chances of suc-cessful completion In case the SKOSRec engine is ap-plied in a live setting the design of the user interfacewill have to be tested with more users Our experimen-tal series focused on algorithmic performance whichwas tested with numerous participantsSubjects were recruited through the clickworkercomplatform8 Participants received a compensation fee

6httptomcatapacheorg7httpwwworaclecomtechnetworkjavajavaeeservletindexhtml8httpswwwclickworkerde

of 100e or 150e for a completed survey 103 par-ticipants took part in the study on travel destinationsearch In the multimedia domains in total 154 sub-jects evaluated the SKOSRec engine For each partof the evaluation it was ensured that participants hadnot just run through the evaluation screen Wheneverthere appeared to be irregularities in the log files (egthe default settings of the screen were not changed)these assessments were excluded from further analy-sis The survey template started with a consent formIt informed participants about the context and the pro-cedure of the study After having filled in the consentform participants were navigated to the first part of theexperiment It collected data on subjectsrsquo demograph-ics and their consumption and search habits After thepreparation phase participants worked on the first testcase (TC1) in each usage scenario In this test case theperformance of the SKOSRec engine was assessed inthe simple execution mode of on-the-fly recommenda-tions (see Subsect 32) It was done to gather data onthe usefulness of suggestions resulting from a simplecontent-based approach (ie baseline method) withwhich more advanced recommendation strategies werecompared at a later stage In cases where users did notstate any items the application showed an error pageWhile setting up a profile required some effort on theside of the user it was a necessary step for the experi-ment Since we did not have access to the log files of alive application profile generation had to be simulatedThe process was facilitated by a specifically tailoredinterface (see Fig 2) where subjects could search foritems through an autocomplete field that applied AJAXtechnology9

It enabled users to type in keywords (eg the nameof a music act or the author of a book) for whichmatchings were instantaneously displayed withoutreloading the site Hence participants could quicklyfind and select their items of interest Whenever auser entered a query through the AJAX interfacethe system matched the query keywords against anApache SolrLucene search index10 The index wasbuilt with item metadata which had been extractedfrom DBpedia prior to setting up the website Theindex comprised the names and abstracts of itemsWhenever available information on thumbnail pic-tures were saved in the index to be ready for display inthe AJAX interface In case the Apache SolrLucene

9httpapijquerycomjqueryajax10httpluceneapacheorgsolr

14 L Wenige et al Similarity-based Graph Queries

Fig 2 User profile generation TC1 (music domain)

search engine found index keywords that matched theuser query metadata information for these items wereimmediately shown on the screen This informationwas displayed to help participants make an informeddecision on whether the AJAX result list containedsuitable items for profile generationAside from the AJAX feature the appeal of the in-terface was increased by a progress bar at the top ofthe webpage (see Fig 2) Progress bars are commonlyutilized tools for web surveys In conventional paper-based studies subjects usually see how far ahead theyare while in a web survey they cannot check theirprogress immediately This is especially true for stud-ies that apply a screen-by-screen navigation whichwas the chosen approach for the conducted web ex-periments Progress bars prevent users from tiring ofthe questionnaire and withdrawing from the study alto-gether by showing how close participants are to com-plete the experiment [14]Upon profile generation the SKOSRec engine calcu-lated on-the-fly recommendations for a simple query(similar to Q1 Sect 3) which were shown to usersin a separate screen Suggestions were displayed witha sufficient amount of information eg thumbnailslabels or a short description to help participants as-sess the quality of the recommendation In addition tothis information users could also access the respectiveWikipedia article by following the hyperlink on the la-bel Relevance sliders were used to assess the utility ofeach recommendation They are often applied in eval-uations of IR systems [51 54] Scores were handled aspoints on a 0 to 100 scale of the relevance slider (outerleft side lowest relevance outer right side highest rel-evance see Fig 3)

This approach facilitated the calculation of an accu-rate precision score and the application of tests withhigher statistical power than an ordinal 5-star scaleThe score called mean relevance (mrs) was measuredwith Equation 19

mrs =

ksumi=1

scorei

k(19)

In addition to the precision score the web applica-tion also determined the size of the recommendationlist produced by the engine as an indicator for recallHowever recommendation list size can only be uti-lized for this purpose when mrs scores reach a reason-able levelWhile accurate recommendations are useful becausethey build trust in a system [57] they only capture anarrow aspect of performance [36] For instance anonline retailer can tremendously profit from an RSthat provides both relevant and unobvious recommen-dations since it enables users to explore the prod-uct catalog more thoroughly On the other hand inthe case of an RS only suggesting popular productsit is likely that the user already knows them [3 23]A pure accuracy-based evaluation disregards these as-pects which is why other performance indicators suchas novelty were measured as well Thus the evalua-tion interface also contained a section where subjectshad to tick a radio button that indicated whether anitem was new In the backend of the web applicationnovelty was measured as the ratio of relevant items not

L Wenige et al Similarity-based Graph Queries 15

Fig 3 Recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

familiar to the user (|nov|) among the total numberof relevant items (|tp|) (Eq 20)

nv =|nov||tp|

(20)

An item was considered relevant when it achievedan mrs score of 50 or higher In addition to novelty anddiversity the web application also tracked the diversityof recommendation lists since a topically diversifiedresult set has been found to increase overall user satis-faction [31 66] In the RS literature the most widelyexplored diversification metrics are the ones that mea-

sure the similarity between each item pair in the rec-ommendation list The more the items are alike theless diverse is the result set [19] The study series ofthis paper applied the metric by Castells et al [8] Itdetermines item-to-item similarity values with the co-sine similarity measure Scores are then converted todistance values (Eq 21) and averaged for the recom-mendation list (Eq 22)

d(il im) = 1minus cos(il im) (21)

16 L Wenige et al Similarity-based Graph Queries

divC =2

k(k minus 1)

sumlltm

d(il im) (22)

The content-based diversity scores (divC) were cal-culated in the backend of the web application andsaved in a log file for each generated recommenda-tion list of the SKOSRec engine during runtime execu-tion In addition to the measurement of divC scores weasked participants about their opinions on recommen-dation list diversity directly Following the RS evalua-tion framework by Pu et al [44] two Likert items wereposed for this purpose

ndash The recommendations of list [] are diversendash The recommendations of list [] are similar to

each other

Besides the above listed algorithmic performancemetrics the authors gathered assessments from usersregarding the overall usefulness of the suggestionsThis approach is also in line with the empirically veri-fied evaluation framework by Pu et al [44] which pro-poses to measure the psychometric construct of Per-ceived Usefulness to get a comprehensive understand-ing of user opinions on recommendation results Thusparticipants were instructed to state their agreement topositive statements regarding the relevance of the sug-gestions The statements were adapated to the speci-ficities of each usage scenario Fig 4 shows an exam-ple collection of Likert questions for the domain ofmusic RS

After participants had provided the required infor-mation their answers were saved in a log file and theywere navigated to test case 2 (TC2) which evaluatedthe SKOSRec enginersquos performance for constraint-based queries (similar to Q3 Sect 3) We tried to findout if the engine was able to improve user satisfactionthrough filter conditions Even though IR systems al-ready successfully apply facet filters on result sets theexperiments still had to prove whether LOD-enabledconstraints have the same effect on the Perceived Use-fulness of recommendations For the comparisons ofnon-filtered and filtered suggestions we applied awithin-subjects design in which participants had torate two recommendation lists resulting from regular(non-filtered) and filtered requests in TC2 We favoredthis setup over a between-subjects design where sub-jects would have been assigned to only one result listWhile the between-subjects design more closely re-sembles a real-world setting since it does not require

participants to interact with more than one approachat a time differences among subjects (eg regard-ing demographics or topic expertise) can potentiallyhave a considerable impact on quality judgments [19]Hence when applying a between-subjects design onedoes not know whether differences in performance arecaused by the approaches or by the variability amongparticipants Therefore researchers need to conductbetween-subjects experiments with a sufficient numberof study subjects to level out these differences Within-subjects studies on the other hand require fewer par-ticipants while still achieving the same level of sta-tistical power since between-subject variability is notan issue [30 31] Beel et al have shown that evensmall variations in an RS test setting could cause sub-stantial differences in performance outcomes [6] Tokeep the extent of variations even lower comparisonsbetween non-filter and constraint-based recommenda-tions were based on the same profile for each user inTC2 Therefore the user profile from TC1 served as astarting point for filtered retrieval The web applicationshowed the recently generated profiles to users andasked them to provide an additional filter conditionFig 5 depicts an example web form for a constraint-based recommendation request in the music domain Itcomprises the user profile and an additional text fieldwhere users stated their constraint

Afterwards the engine applied this filter conditionon the set of potential recommendation results be-fore similarity calculation started For each usage sce-nario different types of constraints were available Re-garding suitable filter options the author sought toachieve a balance between assisting users in finding fil-ter conditions that would adequately represent their in-formation needs while keeping the variability amongusers as low as possible Therefore the specificitiesof each scenario and the availability of the respectivedata sources were determined For instance for themultimedia domains (movie music and books) it wasassumed that users would like to filter recommenda-tions according to the specific genre of the item Fortu-nately the DBpedia dataset contains this kind of infor-mation for the music domain In this domain the genrecan be retrieved through the properties dbogenreHowever in the movie and book domains DBpediadoes not provide sufficient information Many LOD re-sources which represent books were not assigned agenre In the movie domain a genre property is miss-ing entirely and genre-specific information can of-ten only be found in the subject categories describ-ing the movie Hence in contrast to the experiment on

L Wenige et al Similarity-based Graph Queries 17

Fig 4 Evaluation screen for recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

music recommendations participants in the domainsof books and films did not filter their recommenda-tion lists by genre On the other hand the propertydcsubject was applied as recommendation filter ineach multimedia domain Since the subject propertyoften links to many informative features of multimediaitems (eg release period geographic information orcontent information) the author decided that this fea-ture represents a powerful filter dimensionIn the experiment on travel destination search the webinterface offered two filters a subject and a location-specific filter The subject filter enabled users to statetheir preferences regarding the characteristics of a lo-cation (eg being a nature reserve) The location-specific filter on the other hand gave participants theoption to specify where the desired destination shouldbe located11

The second fundamental research issue of TC2 con-cerned the question whether expressive constraints canboost recommendation quality even further Hence

11httpdbpediaorg

apart from executing a constraint-based workflow witha simple filter (ie a direct attribute of an item)the SKOSRec engine generated suggestions with thehelp of an expressive filter as well This second pro-cedure while still applying the same user constraintas the simple filter utilized an expressive graph pat-tern to include additional LOD resources in the resultset For instance in case a subject filter was selectedresult set expansion was facilitated by exploring thewider semantic space of the DBpedia category graphthrough a property path declaration in the graph pat-tern (eg skosbroader[2]) In the travel exper-iment an additional expressive filter retrieved place-related information (dulhasLocation) for location-specific subproperty relations that are not material-ized in DBpedia Hence the expressive graph pat-tern expanded the filter such that additional proper-ties were matched (eg dbocountry dbocity ordbostate) while still applying the specified userconstraint on the set of LOD resources Upon exe-cution of constraint-based recommendation requestsusers received two recommendation lists (resultingfrom simple and expressive filtering accordingly) in a

18 L Wenige et al Similarity-based Graph Queries

Fig 5 Web form for a contraint-based recommendation request (music domain)

separate evaluation screen Participants evaluated re-sult sets in the same manner as in TC1 For each resultset the screen contained an additional Likert item ask-ing subjects to give their agreement to the statementldquoThe filter has improved the recommendation resultsof listrdquo The sequence of suggestions resulting fromthe different methods was randomly assigned in eachuser session to avoid order effectsAfter study subjects had completed TC2 they werenavigated to test case 3 (TC3) In the TC3 section ofthe travel experiment participants could choose be-tween three options for rollup query patterns (simi-lar to Q6 Sect 3) Users were able to obtain recom-mendations for travel destinations based on POIs theyhad visited during the stay in another destination Theycould either select a city (option 1) a region (option 2)or a country (option 3) as destination type Upon en-tity type selection the SKOSRec engine generated ap-propriate suggestions Fig 6 shows the web form fromthe travel experiment that facilitated the formulation ofadvanced rollup requests in TC3

When users had selected an entity type stated theirfavorite travel destination and three belonging POIs

the application generated a SKOSRec query from theseparameters A simple on-the-fly request was sent tothe engine as well to retrieve baseline recommenda-tions with which the suggestions resulting from theadvanced request were compared at a later stage ofthe experiment The simple query only contained theuserrsquos favorite travel destination and the selected en-tity type option thus omitting information on the POIsDuring the travel experiment the engine processedboth the simple and the advanced query and producedtwo recommendation lists As in the previous parts ofthe experiment participants assessed the quality of theresult sets which appeared in random order on thescreenThe multimedia experiments also contained a sec-tion on rollup retrieval (TC3) As the advanced travelqueries the multimedia requests aggregated similarityscores of sublevel entities through summation whichthe engine joined with a postfilter In addition to thesefeatures the pattern contained a preference query partThis section identified a set of preferred items througha single user statement Participants either specifiedtheir favorite directoractor (music domain) music act

L Wenige et al Similarity-based Graph Queries 19

Fig 6 Travel - User profile generation TC3

(music domain) or author (book domain) and receivedrecommendations based on the features of the workscreated by the artist (similar to Q5 Sect 3) TheSKOSRec engine generated recommendations basedon the creative works of the favored artist Partici-pants also received suggestions from a simple on-the-fly query which computed results according to theartistrsquos characteristics As in the travel experiment rec-ommendations from the advanced rollup request werecompared with the on-the-fly query without letting par-ticipants know which approach they were assessingand through random assignment of result set positionsAfter participants of the multimedia experiments hadevaluated the results of TC3 they were guided to testcase 4 (TC4) which evaluated cross-domain queriesThe experiment on travel RS ended with TC3 becauseno suitable graph patterns could be identified to fa-cilitate these kinds of suggestions for the travel usagescenario In the multimedia domain however the datafrom DBpedia was sufficient for this retrieval task Theentity type of the study (ie movie music act or book)defined the source domain based on which the enginegenerated suggestions for items from another multime-dia target domain (similar to Q7 Sect 3) Fig 7 de-picts the cross-domain web form of the music experi-ment

The web form assisted subjects in formulating a re-quest which then obtained movie suggestions basedon the favorite music act of the user Parameters fromusers were entered in the placeholders of the querypattern in the backend of the application We formu-lated the query templates in such a way that the queryengine matched any properties or graph patterns thatcould potentially connect two items from the specified

target and source domains For instance such match-ings can occur when a user states his favorite moviethat is written by a particular author In case the sameauthor has also written some books they might be ofinterest to the user as well Another example stemsfrom the movie domain Suppose in hisher profile aconsumer has declared a preference for a movie thatlinks to the corresponding soundtrack It might wellbe the case that the user likes the soundtrack and themusic acts that were involved in creating it just asmuch as she likes the movie The practice to distin-guish homonymous LOD resources through the prop-erty dbowikiPageDisambiguates is another in-teresting linking pattern from DBpedia that was ex-ploited for cross-domain requests in TC4 since it con-nects similar entities (ie the book edition of a certainmovie)Cross-domain SPARQL queries (similar to Q8 Sect3) were posed as baseline requests in this test caseWhile the SPARQL query only matched cross-domainrelations for a single LOD resource the SKOSRec re-quest expanded the search space by considering linksfrom more than one item As in the previous test casesTC4 ended with an evaluation screen where users as-sessed the SPARQL-based as well as the SKOSRecsuggestions in random order without knowing whichrecommendation list belonged to which method Table10 gives an overview of the test cases and the meth-ods that were applied in the experiments in each usagescenario

42 Results

In total 257 people took part in the studyThe gath-ered samples from the experiments contained answers

20 L Wenige et al Similarity-based Graph Queries

Fig 7 Music - User profile generation TC4 (page 9)

Table 10Test Cases in the web experiments

Test Case Evaluation Purpose Query Type Travel Movie Music BookTC1 On-the-fly Recommendations (Baseline) Q1 X X X X

TC2 Constraint-based Recommendations Q3 X X X X

TC3 Advanced Queries (Rollup) Q5 Q6 X X X X

TC4 Advanced Queries (Cross-Domain) Q7 x X X X

from both genders with a slight overrepresentation ofmale participants (537) The prevailing number ofstudy subjects was between 20 and 40 years of age(658) The evaluation of the demographics sectionalso revealed that the majority of participants alreadyperceives the search in their domain of interest as ei-ther ldquomanageablerdquo (385) or even as ldquoeasyrdquo or ldquoveryeasyrdquo (440)Prior to analyzing the performance results of the novelretrieval approaches of the SKOSRec engine it waschecked whether the respective Likert items for Per-ceived Usefulness and user-based diversity (divU)reached acceptable reliability levels For this purposeCronbachrsquos α values were determined Table 11 showsthe outcome of this analysis for the Perceived Useful-ness construct Given the acceptable values through-out the domains (Cronbachsrsquo s α gt 07) it could betreated as an interval-scaled variable in the statisticaltests

Table 11Cronbachrsquos α values for the concept of Perceived Usefulness

Domain Cronbach αTravel 07002

Movie 08579

Music 08515

Book 08949

Besides usefulness user-based diversity scores werealso measured by multiple Likert items However

Cronbachrsquos α values for this concept did not reach ac-ceptable levels In order to avoid that valuable user as-sessments remained unused agreements to the state-ment bdquoThe recommendations of list [] are diverse ldquowere taken into account as an ordinal variable (divU)Based on these results subsequent tests could be car-ried out The first part of the analysis concerned theperformance of the on-the-fly retrieval method (base-line) of the SKOSRec engine in TC1 The outcomeof this analysis confirms that the on-the-fly approachrepresents a viable recommendation strategy Table 12shows the mean (M)12 standard deviation (S D) andnumber of participants (N) for each performance di-mension and metric It can be seen that users assessedthe enginersquos performance in almost each quality di-mension to be above mediocrity Aside from the posi-tive user evaluations with regard to Perceived Uselful-ness related quality aspects such as Accuracy Noveltyand Diversity received good assessments as well Forinstance each domain mean relevance score (mrs) washigher than 50 score points It means that participantswere often convinced of the relevance of the resultsGiven these positive user assessments with regard toitem relevance it seems to be justified to consider themean of result list sizes as an indicator of recall Theengine was able to generate the required number ofrecommendations in almost each test case (see Table

12In case of the user-based diversity score (divU) M denotes themedian because of the ordinal scale of the variable

L Wenige et al Similarity-based Graph Queries 21

12) The on-the-fly retrieval approach was also testedwith regard to the Novelty dimension Table 12 showsthe mean novelty scores (nv) for the baseline recom-mendation approach Novelty scores indicate that themajority of items in a result list were deemed to be newin the travel and book RS whereas in the experimentson movie and music RS only one third of the recom-mendation list was unfamiliar to users In contrast tothat content-based Diversity scores (divC) were ratherhigh in each domain Hence it can be concluded thatitems in a result set usually have a low rate of match-ing SKOS annotations In addition to content-basedscores user assessments of recommendation list Di-versity (divU) were also taken into account The me-dian level of agreement to the Diversity-related Lik-ert item stating that recommendations are diverse liedat 3 (ldquoneutralrdquo) in the multimedia domains and at 4(ldquoagreerdquo) in the travel domainIn summary the baseline approach achieved fairlygood performance results in TC1 The findings demon-strate that LOD-enabled recommendations can po-tentially enhance a userrsquos search experience Theyalso allow for subsequent tests of advanced retrievaltechniques which can be compared to this standardmethod

In the analysis of performance results from TC2(constraint-based recommendations) particular atten-tion was paid to participantsrsquo agreements with thestatement ldquoThe filter has improved the recommenda-tion results of the listrdquo Fig 8 shows the distribution ofdomain-wise agreement ratios On average they werefairly high Throughout the domains the vast majorityof participants either selected 5 (ldquostrongly agreerdquo) 4(ldquoagreerdquo) or 3 (ldquoneutralrdquo) Hence a filter often had apositive impact on results

The authors gathered responses to the questionon improvement for recommendation lists resultingfrom both the simple and the expressive filtering ap-proaches Hence they give clues about the generalperformance of constraint-based retrieval Howeversince this section of the study revealed the experimen-tal condition to participants (ie application of a fil-ter) answers to the Likert statement have to be inter-preted cautiously Results may be slightly biased asparticipants were able to guess the underlying agendaHowever these limitations do not exist for compar-isons between simple and expressive filters Since theweb application randomly assigned the list order par-ticipants did not know which approach they were as-sessing This enables us to take a closer look at the dif-ferences between the two filtering approaches It was

Table 12Performance results in TC1 (M = Mean S D = Standard DeviationN = No of participants)

Dimension (Metric)[Max Score]

Domain Results

M SD N

Accuracy (size) [10]

Travel 1000 000 103Movie 1000 000 50Music 1000 000 53Book 980 140 51

Accuracy (mrs) [1]

Travel 5649 000 103Movie 6225 1915 50Music 5570 010 53Book 5747 1555 50

Novelty (nv) [1]

Travel 060 034 101Movie 036 031 50Music 036 039 52Book 061 029 47

Diversity (divU) [5]

Travel 4 - 102Movie 3 - 50Music 3 - 53Book 3 - 50

Diversity (divC) [1]

Travel 080 016 103Movie 087 012 50Music 086 010 50Book 092 008 50

Usefulness [5]

Travel 334 085 103Movie 355 073 50Music 318 085 53Book 343 081 50

hypothesized that advanced filters improve recommen-dation quality as they may identify more relevant re-sources thereby overcoming problems of data qualityor data sparsity in LOD repositories Table 13 lists re-sponse rates of the two methods The rate measuresthe ratio of non-zero result sets among all result setsThroughout the domains recommendation lists result-ing from expressive filtering achieved higher responserates

Table 13Response Rates (TC2)

Domain Constr-based(Regular)

Constr-based(Expanded)

N

Travel 23 57 103

Movie 86 88 50

Music 72 75 53

Book 73 76 51

22 L Wenige et al Similarity-based Graph Queries

000

025

050

075

100

Digital Library Travel Movie Music Book

Domain

Per

cent

age

of A

gree

men

t

Degree of Agreement

1

2

3

4

5

The filter has improved the recommendations of the list

Fig 8 Participantsrsquo agreement with the statement that the filter has improved the result list (domain-wise) (TC2)

In addition to response rates user assessments andscores in the defined quality dimensions were com-pared Table 14 depicts the outcome of this analysisSignificantly higher scores are marked in bold figuresIn case statistical tests identified no meaningful varia-tions the results of the better performing approach areunderlined A-levels were adjusted with the false dis-covery rate (FDR) in order to decrease the probabilityof making a type I error due to multiple testing

According to mean scores expressive requestsgenerated more comprehensive recommendation lists(size) This finding is in line with the increased re-sponse rates shown in Table 13 However subse-quently conducted t-tests confirmed significant dif-ferences only for the travel experiment (t(102) =minus789 p lt 0001) In contrast precision scores werehigher in most of the domains (travel movie mu-sic) when the SKOSRec engine processed a simpleconstraint-based query However statistical tests didnot confirm any systematic differences The authorsapplied Wilcoxon tests for mrs scores and the remain-ing performance metrics because of the small samplesizes resulting from the low response rates of the sim-ple constraint-based approachRegarding Novelty none of the approaches was supe-rior In the Diversity dimension expressive filters gen-erated suggestions that were at least as topically di-versified as the results from simple constraint-basedretrieval Even though no significant differences were

identified between the two approaches increased meanscores (divU in the travel domain and divC in each do-main) indicate a slight superiority of expanded filterrequests in this quality dimension The same appliesto usefulness scores (mean values were higher in 3 outof 4 domains when expressive filtering was applied)However these statements are speculative at best be-cause statistical tests were not significantIn summary it can be concluded that expressive fil-ters improve recall potentially leading to a slight lossin precision scores (see Table 14) It may also be thecase that expanded user constraints diversify as wellas increase the usefulness of recommendation listsHowever these claims are unverified In contrast itis safe to say that expressive query patterns generateresults of at least the same quality as simple patternswhile increasing the number of relevant suggestionsThe few differences may be explained by the high Jac-card scores (JI) (see Table 15) of recommendation listsThroughout the domains result sets were much alikeGiven these similarities and the increases in list sizesfor expressive filters it is supposed that an expandedconstraint produces an enhanced version of the recom-mendation list resulting from simple filtering There-fore expanded filters should be the default setting in aconstraint-based retrieval context

In TC3 (ie evaluation of roll-up query patterns)the SKOSRec engine almost always generated non-empty recommendation lists (see Table 16) In the

L Wenige et al Similarity-based Graph Queries 23

Table 14Performance results in TC2 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Simple Expressive N

M SD M SD

Accuracy (size) [10]

Travel 117 270 452 462 103Movie 672 408 720 392 50Music 596 464 636 451 53Book 459 450 500 451 51

Accuracy (mrs) [1]

Travel 6673 2051 6534 2236 23Movie 6095 2298 6029 2279 43Music 6377 1605 6296 1644 34Book 5617 1818 5681 1681 37

Novelty (nv) [1]

Travel 048 041 061 049 21Movie 044 036 055 080 42Music 055 039 052 037 38Book 077 034 076 033 33

Diversity (divU) [5]

Travel 3 - 35 - 24Movie 4 - 4 - 43Music 4 - 4 - 38Book 3 - 3 - 37

Diversity (divC) [1]

Travel 068 018 072 021 19Movie 070 024 073 023 41Music 086 013 087 012 34Book 084 024 095 063 29

Usefulness [5]

Travel 333 093 350 077 24Movie 332 085 331 086 43Music 355 081 364 078 38Book 339 083 362 121 37

Table 15Jaccard Indices (TC2)

Domain Mean Jaccard Index (JI) SD N

Travel 057 047 25

Movie 071 038 43

Music 082 035 37

Book 092 026 35

travel and the book experiment the response rate washigher for aggregation-based queries whereas in themovie domain the engine more often provided recom-mendations for regular requests However these dif-ferences are only marginal and do not indicate a clearsuperiority of one method

Fig 9 depicts participants general agreement to Lik-ert statements concerning the Perceived Usefulness ofthe approaches in TC3 The diagram shows similarresults for regular as well as advanced requests Onaverage satisfaction scores reached levels which layslightly above neutral agreement A subsequent t-test

Table 16Response Rates (TC3)

Domain Regular Aggr-based N

Travel 99 100 103

Movie 96 92 50

Music 100 100 53

Book 96 100 51

confirmed no significant differences between the twomethods In fact the approach with the highest aver-age score of Perceived Usefulness was different in eachexperiment While mean values were higher for regu-lar requests in the studies on travel and music RS ad-vanced queries produced higher scores in the movieand book experiments

Content-based Diversity (divC) was the only met-ric with systematic differences Here scores went upfor advanced retrieval patterns in the travel (t(98) =minus550 p lt 0001) the music (t(52) = minus451 p lt

0001) and the book domain (t(47) = minus490 p lt

24 L Wenige et al Similarity-based Graph Queries

0

1

2

3

Travel Movie Music Book

Domain

Agr

eem

ent

Method

Regular

Rollminusup

Perceived Usefulness (Regular vs Rollminusup Queries)

Fig 9 Participantrsquos overall satisfaction with regular and roll-up recommendation retrieval (TC3)

0001) (see Table 17) This finding is not surprisinggiven that the aggregation-based approach explores thewider semantic network of each item in the profile

Hence it can be assumed that both approaches pro-duce recommendation lists of similar quality and canprovide benefits to potential consumers Thus it can-not be clearly stated which approach is superior be-cause neither of them outperformed the other in eachdomain While the regular retrieval strategy was bet-ter in terms of usefulness in the travel and the mu-sic domains the aggregation-based approach achievedhigher scores in the domains of movie and book rec-ommendationsThis conclusion is especially impressive given thelow mean Jaccard scores for recommendation lists re-sulting from regular and aggregation-based retrieval(see Table 18) The mean score never exceeded 01

throughout the domains It indicates a low concor-dance between result sets It is remarkable that par-ticipants perceived the recommendations as equallygood given the high dissimilarity of the lists Thusadvanced queries have an added-value as they provideadditional interesting results They can help to exploreLOD repositories in different ways than a regular re-trieval approach Therefore it is worthwhile to offeraggregation-based query patterns as an alternative re-trieval strategy when the regular approach has not pro-duced any helpful recommendations

In addition to TC3 user assessments for TC4 (cross-domain recommendations) were also evaluated How-ever since the authors conducted experiments for TC4only in the multimedia domains the samples were con-siderably smaller Nevertheless it is assumed that theamount of completed user sessions is only just enoughto conduct further statistical analyses Table 19 showsthe response rates for the two retrieval methods Cross-domain patterns had a higher success rate of generat-ing non-zero result sets throughout the domains Par-ticipants almost always received a recommendation forthis query type whereas in case a regular SPARQLquery was executed often they did not receive a singlesuggestion at all

We applied non-parametric tests for the perfor-mance metrics mrs nv divU divC and PerceivedUsefulness due to the low response rates of theSPARQL-based approach which led to a low num-ber of comparable data points accordingly Overall theFDR-adjusted pairwise Wilcoxon tests detected no sig-nificant differences except for user-based Diversityscores (divU) in the movie (Z = minus269 p lt 005) andthe book domain (Z = minus256 p lt 005) The resultsfor the content-based Diversity (divC) metric seem topoint in the same direction because of the high meanvalues for cross-domain queries in each domain Butthe significance of these results was not verified Themrs scores indicate a better performance of regular

L Wenige et al Similarity-based Graph Queries 25

Table 17Performance results in TC3 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular Aggr-based N

M SD M SD

Accuracy (size) [10]

Travel 990 099 942 207 103Movie 288 059 276 082 50Music 1000 000 1000 000 53Book 288 059 300 000 51

Accuracy (mrs) [1]

Travel 4821 2498 4493 2559 102Movie 5458 2288 5669 2542 44Music 5395 1831 5824 1619 52Book 5153 2390 4925 2230 49

Novelty (nv) [1]

Travel 043 039 038 040 98Movie 035 038 028 041 41Music 049 036 046 039 50Book 061 042 062 044 47

Diversity (divU) [5]

Travel 35 - 3 - 100Movie 4 - 3 - 43Music 4 - 3 - 50Book 3 - 3 - 48

Diversity (divC) [1]

Travel 079 017 089 013 99Movie 075 020 066 027 44Music 083 017 093 007 53Book 074 027 093 006 48

Usefulness [5]

Travel 336 077 322 074 101Movie 320 092 326 094 44Music 345 084 344 085 51Book 295 087 303 096 48

Table 18Jaccard Indices (TC3)

Domain Mean Jaccard Index (JI) SD N

Travel 007 011 101

Movie 003 007 44

Music 002 005 53

Book 002 006 50

Table 19Response rates (TC4)

Domain Regular (SPARQL) SKOSRec N

Movie 32 96 50

Music 62 98 53

Book 53 100 51

SPARQL queries but the within-subjects test did notconfirm this assumption statistically (see Table 20)

However the most important finding of the anal-ysis was that SKOSRec cross-domain queries gener-ated significantly more suggestions throughout the do-

mains (movie t(49) = minus1792 p lt 0001 mu-sic t(52) = minus2573 p lt 0001 book t(50) =minus1275 p lt 0001) (see Table 20) Fig 10 illustratesthis graphically

Whenever the SPARQL request provided recom-mendations which was the case in less than half ofthe user sessions (see Table 19) the SKOSRec sug-gestions were assessed to be of almost equal qualityas the SPARQL recommendations (see Table 20 Use-fulness) Hence when SPARQL querying did not pro-duce any results in more than the other half of the testcases the SKOSRec approach may still have provideduseful suggestions

Therefore it is reasonable to assume that the capa-bility of the SKOSRec engine to flexibly switch be-tween processing steps of similarity calculation andpattern matching can facilitate an improved explo-ration of LOD repositories Table 21 also shows thatthe items in the two sets often did not match as isindicated by low Jaccard indices This finding furtherdemonstrates that the execution of an additional step

26 L Wenige et al Similarity-based Graph Queries

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

00

25

50

75

100

Movie Music Book

Domain

No

of i

tem

s

Method

Regular

CrossminusDomain

Recommendation List Size

Fig 10 Mean recommendation list sizes for SPARQL and SKOSRec cross-domain requests (TC4)

L Wenige et al Similarity-based Graph Queries 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

of similarity calculation can help to retrieve other rel-evant items

5 Future Work amp Conclusion

This paper has presented the query-based SKOS-Rec engine that facilitates new types of recommen-dation requests for LOD repositories The evaluationshave shown that the SKOSRec system can often gener-ate relevant and useful suggestions when certain querytemplates are utilized In many cases the applicationof expressive or advanced query patterns (ie in thecontext of constraint-based retrieval or cross-domainrequests) helped to significantly improve recall valueswhile still providing the same level of quality in theremaining performance dimensions Additionally ad-vanced recommendation requests can be used to gener-ate diversified recommendation lists in case users arenot satisfied with the results of regular recommenda-tion requests (ie roll-up or cross-domain retrieval vsregular requests) While not being superior in each do-main and test case is at least safe to say that the novelrecommendation approaches of the SKOSRec engineconsiderably extend common retrieval methods and atleast provide an alternative search strategy when con-ventional methods fail to provide useful results Addi-tionally the increase in diversity and recall is in linewith findings from previous research on LOD-enabledRS [45]It is a strength of the developed approaches that theyfacilitate combinations of graph-based and similarity-based retrieval at different stages of the recommenda-tion workflow This feature extends general search ca-pabilities for semantic networks It requires future re-search to investigate whether in addition to SKOS fur-ther RDF vocabularies can be used for ad-hoc item-to-item similarity computation to increase the range ofpotential usage scenarios since not every LOD datasetcontains SKOS annotations [53] The application ofadditional properties could also boost the recommen-dation quality of some query patterns (eg such as thequery types used in TC3) In this context it also needsto be determined whether the representation of the user

profile in the SKOSRec query language can be substi-tuted with free-text expressions to enable search-likefunctionalities LOD researchers have already devel-oped engines that can process natural language queriesover RDF data [59] [34] [67] It will have to be inves-tigated how the SKOSRec engine profitably fits intothis landscape of existing retrieval toolsAnother open research question concerns the aspect ofinterfaces to other applications In its standard config-uration the system is not an end-user retrieval enginebut a backend application with which an administra-tor who has domain knowledge (eg of RDF vocab-ularies and data models of a particular LOD reposi-tory and usage scenario) creates the respective querypatterns Although the websites of the online experi-ments represent possible implementations of suitableend-user applications for the SKOSRec engine thereare still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selec-tion of LOD items for the profile was made possibleby previously storing the data in an index that couldthen be accessed and searched before issuing a recom-mendation query However this runs against the ideaof ad-hoc retrieval In this context future research willhave to determine whether it is feasible to generate anon-the-fly mapping from items to the respective LODresources In the case of a commercial application theengine needs to match items with the correspondingentries in the product catalog Even if it is not yet clearwhich UI components can make the best possible useof the SKOSRec enginersquos features the results from theuser experiments can give at least a few suggestionsThe following retrieval options should be the defaultsetting in an interface that contains a LOD-enabledRS component to assist users in finding interestingitems Since the regular approach of on-the-fly re-trieval achieved good results throughout the domainsthe display of simple recommendations generated frompreviously stated user preferences represents a suitablestarting point for retrieval On-the-fly queries could berefined by providing filter options Since the evalua-tions have shown that expressive constraints often leadto increased recall values the application of such afilter might be the best option for assisted retrievalAs in the preparations for the experiments appro-priate graph-based query patterns for the usage sce-nario in question would also have to be determinedby a domain expert before setting up the applicationThe same applies to advanced retrieval patterns whichshould be available to the end user to refine the querywhen both the simple as well as the constraint-based

28 L Wenige et al Similarity-based Graph Queries

approach have failed to provide relevant items Thissuggestion is made because roll-up queries were com-petitive with regular recommendations in the web-based experiments Therefore they represent a viablealternative retrieval strategy For some domains (egfor multimedia retrieval) it is also possible to usean optional selection field to enable the formulationof cross-domain requests Evaluations for this querytype suggest that users might receive interesting results(ie diversified and comprehensive recommendationlists) from such a requestIn summary the contributions of the paper are as fol-lows

ndash Development and implementation of novel re-trieval strategies in the SKOSRec system whichfacilitate personalized requests and leverage thepotential of knowledge graphs in order to addressthe research gap in state-of-the-art recommenda-tion methods with regard to customized queries

ndash Realization of web-based user experiments infour usage scenarios with the following out-comes

bull Successful demonstration that knowledge graphdata is applicable in the context of live recom-mendation scenarios

bull Creation of suitable query patterns that canbe applied to represent individual informationneeds in different application contexts The setof query patterns can be extended with the helpof the SKOSRec query language to meet therequirements of additional usage scenarios

bull Proof that the novel retrieval strategies can besuccessfully applied across various applicationcontexts and recommendation tasks and thatthey outperform conventional content-basedretrieval approaches especially in the perfor-mance dimensions of diversity and recall

The prototypical implementation and the evaluationof the SKOSRec engine in different usage scenarioshave demonstrated that the freely available knowledgesources on the LOD cloud can be successfully appliedto advance state-of-the-art methods in IR and recom-mender systems The proposed personalized and cus-tomized retrieval strategies can leverage the potentialof knowledge graphs by better matching informationneeds of users with the help of semantic search op-tions

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be com-prised of up to six elements of which the partsSimProjection and ItemPart are obligatory Henceusers need to state at least their preferencesand how many recommendations they wouldlike to receive Advanced retrieval patterns canbe formulated as well For instance similar re-source retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (Se-lectPart) of LOD resources As in SPARQL thequery can start with a prologue that abbreviatesIRIs with namespace declarations [21]

SelectPart The SelectPart part enables subqueryingwith recommendation results The surroundingquery is a SELECT query with a slightly simpli-fied WHERE clause (ie RecWhereClause) withwhich postfilter conditions can be formulatedIn this section users have the option to formu-late aggregation-based retrieval requests (Aggre-gation)

Aggregation This query part handles aggregation-based postfiltering Users specify the IRI (IRIref )resource for which similarity scores of sublevelentities are summarized Additionally it is statedhow the data should be aggregated (ie SUMMAX or AVG) as well as which variable (Var)represents the aggregation-based recommenda-tions in the postfilter section The compiler makessure that this variable is actually contained in theRecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refersto the list of generated suggestions It definesthe recommendation variable to which all simi-lar LOD resources are mapped In case pre- andpostfilter conditions are set in the other sectionsthe compiler checks whether the variable (Var)appears in these sections as well Otherwise therequired join operations cannot be executed Itis specified how many recommendations shouldbe displayed (Integer) and whether the requestwill be issued against a default repository or as across-repository request (ServiceIntegration)

ServiceIntegration This section indicates that a userintends to receive recommendations from a dif-ferent SPARQL endpoint than the default end-point that is stated in the standard configurationof the SKOSRec engine It specifies the targetSPARQL endpoint (IRIref ) from which a user

L Wenige et al Similarity-based Graph Queries 29

wants to receive recommendations Upon extrac-tion of SKOS annotations from the default sourceendpoint a subsequent request is sent to the spec-ified target endpoint

ItemPart This section of a SKOSRec query repre-sents a single preference in the user profile How-ever recommendation queries can contain a cou-ple of preference statements A preference is ei-ther expressed as a LOD resource (IRIref ) or asa variable which is bound to a preference query(VarPart) In the latter case the resources areobtained by matching the graph pattern in theVarPart with the triple statements in the reposi-tory The ItemPart statement can be additionallysupplemented with a concept-to-concept similar-ity threshold (Sim) that triggers concept expan-sion with the approach of flexible similarity de-tection which is extensively described in one ofour previous works [64]

VarPart The construct initiates preference queryingIndividual likings are expressed as a graph pat-tern in a RecWhereClause The variable (Var) isa placeholder for the LOD resources that will beused to set up the user profile The compiler hasto make sure that the specified variable occurs inthe RecWhereClause as well Otherwise the ex-traction of LOD resources cannot be carried outcorrectly

Sim The Sim part of a SKOSRec query denotes thethreshold of concept-to-concept scores for flexi-ble similarity detection Even though knowledge-based similarity values usually range between 0and 1 the specification allows a broader range ofdigits in the Decimal datatype to enable applica-tion of other similarity metrics in case they areneeded

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users canstate whether scores should be ldquolargerrdquo ldquolargerthanrdquo or ldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three dif-ferent sections of a SKOSRec query ie either inthe prefilter (ItemPart) the postfilter (SelectPart)or the preference filter section (VarPart) TheRecWhereClause closely resembles the ldquoWhere-Clauserdquo of the SPARQL 11 specification withall its subsequent parts [21] Hence it enablesdifferent combinations of graph pattern match-ing However the RecWhereClause of the SKOS-Rec language allows fewer pattern matching ex-pressions than the ldquoWhereClauserdquo of the regular

SPARQL syntax specification [21] For instanceit is neither permitted to formulate subqueries norto direct filter requests to more than one reposi-tory (see RecGraphPatternNotTriples for furtherexplanations) This modification has been madeto simplify similarity calculation After parsingthis section the compiler makes sure that recom-mendation preference or postfilter variables oc-cur at least once in the RecWhereClause to guar-antee that subsequent steps of the recommenda-tion workflow can be applied on existing LOD re-sources In case the RecWhereClause comprises aRecMinusGraphPattern it also has to be checkedthat the variable is not only contained in theMINUS part of the group

RecGroupGraphPattern A RecGroupGraphPatterncan contain one to many basic graph patternswhich can be differently combined according toRecGroupGraphPatternSub

RecGroupGraphPatternSub This section defines thegraph patterns that are applied to identify suit-able LOD resources User filters can be either ex-pressed as basic graph patterns (TriplesBlock) oras combinations of them (RecGraphPatternNot-Triples) The sequence of different kinds of pat-terns can be flexibly specified

RecGraphPatternNotTriples This query part closelyresembles the similar named ldquoGraphPatternNot-Triplesrdquo of the SPARQL 11 specification [21]The only difference to SPARQL is that the Rec-GraphPatternNotTriples section is a little morerestricted in terms of admissible graph patternsthan the ldquoGraphPatternNotTriplesrdquo section of theregular SPARQL specification For instance itdoes not allow to retrieve LOD resources otherthan from the default graph that is specified in theconfiguration of the SKOSRec engine since filterpatterns have to be applied to datasets that containSKOS annotations Hence they need to be speci-fied before runtime execution to ensure that sim-ilarity calculation can be carried out correctly Ifa user was able to formulate filter conditions thatreferred to different RDF graphs and endpointsaccordingly this would not be possible

RecGroupOrUnionGraphPattern This query sec-tion marks an alternative graph pattern in the styleof the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntaxspecification [21] However it neither allowsSPARQL-like subqueries nor federated queries

30 L Wenige et al Similarity-based Graph Queries

for filtered resources because this would compli-cate similarity calculation

RecOptionalGraphPattern In this part of a SKOS-RecQuery users are enabled to specify additionalgraph patterns which might extend the query so-lution but do not necessarily have to match thedata

RecMinusGraphPattern This section handles exclu-sion of LOD resources for cases when users wantto omit certain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

L Wenige et al Similarity-based Graph Queries 31

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM A

mobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige J Ruhland Flexible on-the-fly recommendationsfrom Linked Open Data repositories In International Confer-ence on Business Information Systems pp 43ndash54 2016

[63] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[64] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[65] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[66] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[67] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Future Work amp Conclusion
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 12: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge … · Semantic Web 0 (0) 1 1 IOS Press Similarity-based Knowledge Graph Queries for Recommendation Retrieval Lisa Wenige*,

12 L Wenige et al Similarity-based Graph Queries

Table 9Result set of Q8

movie musicalActdbrThe_Jacksons dbrThe_Jackson_5

The following of definitions will formalize the post-filtered and aggregation-based retrieval forms Defini-tion 12 specifies the notion of SKOSRec recommenda-tions that may potentially undergo postprocessing

Definition 12 (SKOSRec recommendations) SKOS-Rec recommendations (R(Pr)) are the set of sugges-tions generated by processing the solution mappings(Ω) obtained from recommendation retrieval Theycan be a result of simple on-the-fly retrieval prefilter-ing or a result of a combination of these techniques Prrepresents the input profile that contains at least onepreference for an item The cardinality of R(Pr) is theintended number of recommendations (k) that is spec-ified by the user based on which the engine selects theresources with the highest recommendation scores thatare not contained in the profile (Eq 12)

R(Pr) = x | x isin q | score(Pr q) gt kmaxqisinΩ

score(Pr q) q isin Pr

(12)

The approach of subquerying with recommendationresults builds on the idea that SKOSRec suggestionsR(Pr) can be joined with any graph pattern Ppost pastto the process of similarity calculation (Definition 13)It does not make a difference how the engine obtainedthese recommendations

Definition 13 (Postfiltered recommendation mapping)The postfiltered recommendation mapping is obtainedby joining solution mappings generated from SKOS-Rec recommendations with the results of matching apostfilter graph pattern (Ωpost) (Eqs 13 and 14)

Ωpost f ilter = micro | micro isin [[Ppost]]D x isin dom(micro)

(13)

Ωpost = micro|x | x isin R(Pr) Ωpost f ilter (14)

As the prefilter Ppost can be comprised of triplestatements that are eligible in the RecGroupGraph-

Pattern part of the SKOSRec grammar Aggregation-based queries have to be treated differently than a reg-ular postfilter request Apart from excluding sublevelitems of the profile from the set of similar resourcesthe engine should also omit any connections betweenthe preferred superordinate item (a) and similar sub-level entities Additionally the user has to specify thevariable in his profile (eg y) that represents the itemswithin the postfilter graph pattern Ppost By this meansaggregation-based recommendations can be generated(Definition 14)

Definition 14 (Resources for aggregation-based re-trieval) The set of resources for aggregation-based re-trieval is obtained by retrieving LOD resources thathave a connection to a previously generated sugges-tion thereby excluding all resources that are linked tothe specified superordinate item a in the profile (Eq15)

Ragg(a) = micro(y) | micro isin Ωpost y isin dom(micro) a isin micro(y) (15)

For each potential aggregation-based suggestion (y)similarity scores of connected entities (x) have to beconsidered for the final ranking In this context differ-ent approaches to aggregation can be applied In caseswhen both similar items and aggregation-based recom-mendations are entities on comparable levels of gran-ularity then choosing the maximum similarity scoremight be the best way to represent the relatedness ofthe resource to the user profile On the other handwhen postfiltered recommendations are based on sim-ilarity scores of their sublevel entities (eg as in themovie recommendation example) the scores of relatedsuggestions should be aggregated by the sum or theaverage of individual scores

Definition 15 (Aggregation-based ranking score) Theranking score of an LOD resource y that is connectedto a set of recommended items is determined by aggre-gating scores of connected entities either by the max-imum (Eq 16) the sum (Eq 17) or the average (Eq18) of individual scores

scoreaggMAX(y) = maxxisinmicro(xy)

score(Pr x) (16)

scoreaggS UM(y) =sum

xisinmicro(xy)

score(Pr x) (17)

L Wenige et al Similarity-based Graph Queries 13

scoreaggAVG(y) =

sumxisinmicro(xy) score(Pr x)

|x isin micro(x y)|(18)

4 Evaluation of the SKOS Recommender

41 Experimental Setup

The novel recommendation strategies of the SKOS-Rec engine were evaluated in the context of web-basedexperiments for the usage scenarios of travel destina-tion search and multimedia RS The online approachhelped to reach out to more participants with diversedemographic backgrounds which would not have beenpossible in a laboratory experiment Another positiveside-effect of the web-based setting was that partici-pants could test and rate recommendations without be-ing directly observed by an experimenter which mighthave caused biased results otherwise Hence the on-line approach softened common limitations of labora-tory studies [19]The experiments were carried out in the defined usagescenarios with metadata descriptions from DBpediaThe database containing the local mirror of DBpediaran on a virtual server Besides the triple store a webapplication was hosted on the same machine The web-site ran on an Apache Tomcat 7 application server andwas implemented with the help of Java servlet technol-ogy67

We conducted the online experiments between April2016 and January 2017 Upon setting up the web ap-plication a pretest was carried out to control for po-tential pitfalls such as misleading navigation Two testusers ran step by step through the web interface andprovided their feedback Because of their insights weslightly modified the interface (eg through rephras-ing user instructions) to increase the chances of suc-cessful completion In case the SKOSRec engine is ap-plied in a live setting the design of the user interfacewill have to be tested with more users Our experimen-tal series focused on algorithmic performance whichwas tested with numerous participantsSubjects were recruited through the clickworkercomplatform8 Participants received a compensation fee

6httptomcatapacheorg7httpwwworaclecomtechnetworkjavajavaeeservletindexhtml8httpswwwclickworkerde

of 100e or 150e for a completed survey 103 par-ticipants took part in the study on travel destinationsearch In the multimedia domains in total 154 sub-jects evaluated the SKOSRec engine For each partof the evaluation it was ensured that participants hadnot just run through the evaluation screen Wheneverthere appeared to be irregularities in the log files (egthe default settings of the screen were not changed)these assessments were excluded from further analy-sis The survey template started with a consent formIt informed participants about the context and the pro-cedure of the study After having filled in the consentform participants were navigated to the first part of theexperiment It collected data on subjectsrsquo demograph-ics and their consumption and search habits After thepreparation phase participants worked on the first testcase (TC1) in each usage scenario In this test case theperformance of the SKOSRec engine was assessed inthe simple execution mode of on-the-fly recommenda-tions (see Subsect 32) It was done to gather data onthe usefulness of suggestions resulting from a simplecontent-based approach (ie baseline method) withwhich more advanced recommendation strategies werecompared at a later stage In cases where users did notstate any items the application showed an error pageWhile setting up a profile required some effort on theside of the user it was a necessary step for the experi-ment Since we did not have access to the log files of alive application profile generation had to be simulatedThe process was facilitated by a specifically tailoredinterface (see Fig 2) where subjects could search foritems through an autocomplete field that applied AJAXtechnology9

It enabled users to type in keywords (eg the nameof a music act or the author of a book) for whichmatchings were instantaneously displayed withoutreloading the site Hence participants could quicklyfind and select their items of interest Whenever auser entered a query through the AJAX interfacethe system matched the query keywords against anApache SolrLucene search index10 The index wasbuilt with item metadata which had been extractedfrom DBpedia prior to setting up the website Theindex comprised the names and abstracts of itemsWhenever available information on thumbnail pic-tures were saved in the index to be ready for display inthe AJAX interface In case the Apache SolrLucene

9httpapijquerycomjqueryajax10httpluceneapacheorgsolr

14 L Wenige et al Similarity-based Graph Queries

Fig 2 User profile generation TC1 (music domain)

search engine found index keywords that matched theuser query metadata information for these items wereimmediately shown on the screen This informationwas displayed to help participants make an informeddecision on whether the AJAX result list containedsuitable items for profile generationAside from the AJAX feature the appeal of the in-terface was increased by a progress bar at the top ofthe webpage (see Fig 2) Progress bars are commonlyutilized tools for web surveys In conventional paper-based studies subjects usually see how far ahead theyare while in a web survey they cannot check theirprogress immediately This is especially true for stud-ies that apply a screen-by-screen navigation whichwas the chosen approach for the conducted web ex-periments Progress bars prevent users from tiring ofthe questionnaire and withdrawing from the study alto-gether by showing how close participants are to com-plete the experiment [14]Upon profile generation the SKOSRec engine calcu-lated on-the-fly recommendations for a simple query(similar to Q1 Sect 3) which were shown to usersin a separate screen Suggestions were displayed witha sufficient amount of information eg thumbnailslabels or a short description to help participants as-sess the quality of the recommendation In addition tothis information users could also access the respectiveWikipedia article by following the hyperlink on the la-bel Relevance sliders were used to assess the utility ofeach recommendation They are often applied in eval-uations of IR systems [51 54] Scores were handled aspoints on a 0 to 100 scale of the relevance slider (outerleft side lowest relevance outer right side highest rel-evance see Fig 3)

This approach facilitated the calculation of an accu-rate precision score and the application of tests withhigher statistical power than an ordinal 5-star scaleThe score called mean relevance (mrs) was measuredwith Equation 19

mrs =

ksumi=1

scorei

k(19)

In addition to the precision score the web applica-tion also determined the size of the recommendationlist produced by the engine as an indicator for recallHowever recommendation list size can only be uti-lized for this purpose when mrs scores reach a reason-able levelWhile accurate recommendations are useful becausethey build trust in a system [57] they only capture anarrow aspect of performance [36] For instance anonline retailer can tremendously profit from an RSthat provides both relevant and unobvious recommen-dations since it enables users to explore the prod-uct catalog more thoroughly On the other hand inthe case of an RS only suggesting popular productsit is likely that the user already knows them [3 23]A pure accuracy-based evaluation disregards these as-pects which is why other performance indicators suchas novelty were measured as well Thus the evalua-tion interface also contained a section where subjectshad to tick a radio button that indicated whether anitem was new In the backend of the web applicationnovelty was measured as the ratio of relevant items not

L Wenige et al Similarity-based Graph Queries 15

Fig 3 Recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

familiar to the user (|nov|) among the total numberof relevant items (|tp|) (Eq 20)

nv =|nov||tp|

(20)

An item was considered relevant when it achievedan mrs score of 50 or higher In addition to novelty anddiversity the web application also tracked the diversityof recommendation lists since a topically diversifiedresult set has been found to increase overall user satis-faction [31 66] In the RS literature the most widelyexplored diversification metrics are the ones that mea-

sure the similarity between each item pair in the rec-ommendation list The more the items are alike theless diverse is the result set [19] The study series ofthis paper applied the metric by Castells et al [8] Itdetermines item-to-item similarity values with the co-sine similarity measure Scores are then converted todistance values (Eq 21) and averaged for the recom-mendation list (Eq 22)

d(il im) = 1minus cos(il im) (21)

16 L Wenige et al Similarity-based Graph Queries

divC =2

k(k minus 1)

sumlltm

d(il im) (22)

The content-based diversity scores (divC) were cal-culated in the backend of the web application andsaved in a log file for each generated recommenda-tion list of the SKOSRec engine during runtime execu-tion In addition to the measurement of divC scores weasked participants about their opinions on recommen-dation list diversity directly Following the RS evalua-tion framework by Pu et al [44] two Likert items wereposed for this purpose

ndash The recommendations of list [] are diversendash The recommendations of list [] are similar to

each other

Besides the above listed algorithmic performancemetrics the authors gathered assessments from usersregarding the overall usefulness of the suggestionsThis approach is also in line with the empirically veri-fied evaluation framework by Pu et al [44] which pro-poses to measure the psychometric construct of Per-ceived Usefulness to get a comprehensive understand-ing of user opinions on recommendation results Thusparticipants were instructed to state their agreement topositive statements regarding the relevance of the sug-gestions The statements were adapated to the speci-ficities of each usage scenario Fig 4 shows an exam-ple collection of Likert questions for the domain ofmusic RS

After participants had provided the required infor-mation their answers were saved in a log file and theywere navigated to test case 2 (TC2) which evaluatedthe SKOSRec enginersquos performance for constraint-based queries (similar to Q3 Sect 3) We tried to findout if the engine was able to improve user satisfactionthrough filter conditions Even though IR systems al-ready successfully apply facet filters on result sets theexperiments still had to prove whether LOD-enabledconstraints have the same effect on the Perceived Use-fulness of recommendations For the comparisons ofnon-filtered and filtered suggestions we applied awithin-subjects design in which participants had torate two recommendation lists resulting from regular(non-filtered) and filtered requests in TC2 We favoredthis setup over a between-subjects design where sub-jects would have been assigned to only one result listWhile the between-subjects design more closely re-sembles a real-world setting since it does not require

participants to interact with more than one approachat a time differences among subjects (eg regard-ing demographics or topic expertise) can potentiallyhave a considerable impact on quality judgments [19]Hence when applying a between-subjects design onedoes not know whether differences in performance arecaused by the approaches or by the variability amongparticipants Therefore researchers need to conductbetween-subjects experiments with a sufficient numberof study subjects to level out these differences Within-subjects studies on the other hand require fewer par-ticipants while still achieving the same level of sta-tistical power since between-subject variability is notan issue [30 31] Beel et al have shown that evensmall variations in an RS test setting could cause sub-stantial differences in performance outcomes [6] Tokeep the extent of variations even lower comparisonsbetween non-filter and constraint-based recommenda-tions were based on the same profile for each user inTC2 Therefore the user profile from TC1 served as astarting point for filtered retrieval The web applicationshowed the recently generated profiles to users andasked them to provide an additional filter conditionFig 5 depicts an example web form for a constraint-based recommendation request in the music domain Itcomprises the user profile and an additional text fieldwhere users stated their constraint

Afterwards the engine applied this filter conditionon the set of potential recommendation results be-fore similarity calculation started For each usage sce-nario different types of constraints were available Re-garding suitable filter options the author sought toachieve a balance between assisting users in finding fil-ter conditions that would adequately represent their in-formation needs while keeping the variability amongusers as low as possible Therefore the specificitiesof each scenario and the availability of the respectivedata sources were determined For instance for themultimedia domains (movie music and books) it wasassumed that users would like to filter recommenda-tions according to the specific genre of the item Fortu-nately the DBpedia dataset contains this kind of infor-mation for the music domain In this domain the genrecan be retrieved through the properties dbogenreHowever in the movie and book domains DBpediadoes not provide sufficient information Many LOD re-sources which represent books were not assigned agenre In the movie domain a genre property is miss-ing entirely and genre-specific information can of-ten only be found in the subject categories describ-ing the movie Hence in contrast to the experiment on

L Wenige et al Similarity-based Graph Queries 17

Fig 4 Evaluation screen for recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

music recommendations participants in the domainsof books and films did not filter their recommenda-tion lists by genre On the other hand the propertydcsubject was applied as recommendation filter ineach multimedia domain Since the subject propertyoften links to many informative features of multimediaitems (eg release period geographic information orcontent information) the author decided that this fea-ture represents a powerful filter dimensionIn the experiment on travel destination search the webinterface offered two filters a subject and a location-specific filter The subject filter enabled users to statetheir preferences regarding the characteristics of a lo-cation (eg being a nature reserve) The location-specific filter on the other hand gave participants theoption to specify where the desired destination shouldbe located11

The second fundamental research issue of TC2 con-cerned the question whether expressive constraints canboost recommendation quality even further Hence

11httpdbpediaorg

apart from executing a constraint-based workflow witha simple filter (ie a direct attribute of an item)the SKOSRec engine generated suggestions with thehelp of an expressive filter as well This second pro-cedure while still applying the same user constraintas the simple filter utilized an expressive graph pat-tern to include additional LOD resources in the resultset For instance in case a subject filter was selectedresult set expansion was facilitated by exploring thewider semantic space of the DBpedia category graphthrough a property path declaration in the graph pat-tern (eg skosbroader[2]) In the travel exper-iment an additional expressive filter retrieved place-related information (dulhasLocation) for location-specific subproperty relations that are not material-ized in DBpedia Hence the expressive graph pat-tern expanded the filter such that additional proper-ties were matched (eg dbocountry dbocity ordbostate) while still applying the specified userconstraint on the set of LOD resources Upon exe-cution of constraint-based recommendation requestsusers received two recommendation lists (resultingfrom simple and expressive filtering accordingly) in a

18 L Wenige et al Similarity-based Graph Queries

Fig 5 Web form for a contraint-based recommendation request (music domain)

separate evaluation screen Participants evaluated re-sult sets in the same manner as in TC1 For each resultset the screen contained an additional Likert item ask-ing subjects to give their agreement to the statementldquoThe filter has improved the recommendation resultsof listrdquo The sequence of suggestions resulting fromthe different methods was randomly assigned in eachuser session to avoid order effectsAfter study subjects had completed TC2 they werenavigated to test case 3 (TC3) In the TC3 section ofthe travel experiment participants could choose be-tween three options for rollup query patterns (simi-lar to Q6 Sect 3) Users were able to obtain recom-mendations for travel destinations based on POIs theyhad visited during the stay in another destination Theycould either select a city (option 1) a region (option 2)or a country (option 3) as destination type Upon en-tity type selection the SKOSRec engine generated ap-propriate suggestions Fig 6 shows the web form fromthe travel experiment that facilitated the formulation ofadvanced rollup requests in TC3

When users had selected an entity type stated theirfavorite travel destination and three belonging POIs

the application generated a SKOSRec query from theseparameters A simple on-the-fly request was sent tothe engine as well to retrieve baseline recommenda-tions with which the suggestions resulting from theadvanced request were compared at a later stage ofthe experiment The simple query only contained theuserrsquos favorite travel destination and the selected en-tity type option thus omitting information on the POIsDuring the travel experiment the engine processedboth the simple and the advanced query and producedtwo recommendation lists As in the previous parts ofthe experiment participants assessed the quality of theresult sets which appeared in random order on thescreenThe multimedia experiments also contained a sec-tion on rollup retrieval (TC3) As the advanced travelqueries the multimedia requests aggregated similarityscores of sublevel entities through summation whichthe engine joined with a postfilter In addition to thesefeatures the pattern contained a preference query partThis section identified a set of preferred items througha single user statement Participants either specifiedtheir favorite directoractor (music domain) music act

L Wenige et al Similarity-based Graph Queries 19

Fig 6 Travel - User profile generation TC3

(music domain) or author (book domain) and receivedrecommendations based on the features of the workscreated by the artist (similar to Q5 Sect 3) TheSKOSRec engine generated recommendations basedon the creative works of the favored artist Partici-pants also received suggestions from a simple on-the-fly query which computed results according to theartistrsquos characteristics As in the travel experiment rec-ommendations from the advanced rollup request werecompared with the on-the-fly query without letting par-ticipants know which approach they were assessingand through random assignment of result set positionsAfter participants of the multimedia experiments hadevaluated the results of TC3 they were guided to testcase 4 (TC4) which evaluated cross-domain queriesThe experiment on travel RS ended with TC3 becauseno suitable graph patterns could be identified to fa-cilitate these kinds of suggestions for the travel usagescenario In the multimedia domain however the datafrom DBpedia was sufficient for this retrieval task Theentity type of the study (ie movie music act or book)defined the source domain based on which the enginegenerated suggestions for items from another multime-dia target domain (similar to Q7 Sect 3) Fig 7 de-picts the cross-domain web form of the music experi-ment

The web form assisted subjects in formulating a re-quest which then obtained movie suggestions basedon the favorite music act of the user Parameters fromusers were entered in the placeholders of the querypattern in the backend of the application We formu-lated the query templates in such a way that the queryengine matched any properties or graph patterns thatcould potentially connect two items from the specified

target and source domains For instance such match-ings can occur when a user states his favorite moviethat is written by a particular author In case the sameauthor has also written some books they might be ofinterest to the user as well Another example stemsfrom the movie domain Suppose in hisher profile aconsumer has declared a preference for a movie thatlinks to the corresponding soundtrack It might wellbe the case that the user likes the soundtrack and themusic acts that were involved in creating it just asmuch as she likes the movie The practice to distin-guish homonymous LOD resources through the prop-erty dbowikiPageDisambiguates is another in-teresting linking pattern from DBpedia that was ex-ploited for cross-domain requests in TC4 since it con-nects similar entities (ie the book edition of a certainmovie)Cross-domain SPARQL queries (similar to Q8 Sect3) were posed as baseline requests in this test caseWhile the SPARQL query only matched cross-domainrelations for a single LOD resource the SKOSRec re-quest expanded the search space by considering linksfrom more than one item As in the previous test casesTC4 ended with an evaluation screen where users as-sessed the SPARQL-based as well as the SKOSRecsuggestions in random order without knowing whichrecommendation list belonged to which method Table10 gives an overview of the test cases and the meth-ods that were applied in the experiments in each usagescenario

42 Results

In total 257 people took part in the studyThe gath-ered samples from the experiments contained answers

20 L Wenige et al Similarity-based Graph Queries

Fig 7 Music - User profile generation TC4 (page 9)

Table 10Test Cases in the web experiments

Test Case Evaluation Purpose Query Type Travel Movie Music BookTC1 On-the-fly Recommendations (Baseline) Q1 X X X X

TC2 Constraint-based Recommendations Q3 X X X X

TC3 Advanced Queries (Rollup) Q5 Q6 X X X X

TC4 Advanced Queries (Cross-Domain) Q7 x X X X

from both genders with a slight overrepresentation ofmale participants (537) The prevailing number ofstudy subjects was between 20 and 40 years of age(658) The evaluation of the demographics sectionalso revealed that the majority of participants alreadyperceives the search in their domain of interest as ei-ther ldquomanageablerdquo (385) or even as ldquoeasyrdquo or ldquoveryeasyrdquo (440)Prior to analyzing the performance results of the novelretrieval approaches of the SKOSRec engine it waschecked whether the respective Likert items for Per-ceived Usefulness and user-based diversity (divU)reached acceptable reliability levels For this purposeCronbachrsquos α values were determined Table 11 showsthe outcome of this analysis for the Perceived Useful-ness construct Given the acceptable values through-out the domains (Cronbachsrsquo s α gt 07) it could betreated as an interval-scaled variable in the statisticaltests

Table 11Cronbachrsquos α values for the concept of Perceived Usefulness

Domain Cronbach αTravel 07002

Movie 08579

Music 08515

Book 08949

Besides usefulness user-based diversity scores werealso measured by multiple Likert items However

Cronbachrsquos α values for this concept did not reach ac-ceptable levels In order to avoid that valuable user as-sessments remained unused agreements to the state-ment bdquoThe recommendations of list [] are diverse ldquowere taken into account as an ordinal variable (divU)Based on these results subsequent tests could be car-ried out The first part of the analysis concerned theperformance of the on-the-fly retrieval method (base-line) of the SKOSRec engine in TC1 The outcomeof this analysis confirms that the on-the-fly approachrepresents a viable recommendation strategy Table 12shows the mean (M)12 standard deviation (S D) andnumber of participants (N) for each performance di-mension and metric It can be seen that users assessedthe enginersquos performance in almost each quality di-mension to be above mediocrity Aside from the posi-tive user evaluations with regard to Perceived Uselful-ness related quality aspects such as Accuracy Noveltyand Diversity received good assessments as well Forinstance each domain mean relevance score (mrs) washigher than 50 score points It means that participantswere often convinced of the relevance of the resultsGiven these positive user assessments with regard toitem relevance it seems to be justified to consider themean of result list sizes as an indicator of recall Theengine was able to generate the required number ofrecommendations in almost each test case (see Table

12In case of the user-based diversity score (divU) M denotes themedian because of the ordinal scale of the variable

L Wenige et al Similarity-based Graph Queries 21

12) The on-the-fly retrieval approach was also testedwith regard to the Novelty dimension Table 12 showsthe mean novelty scores (nv) for the baseline recom-mendation approach Novelty scores indicate that themajority of items in a result list were deemed to be newin the travel and book RS whereas in the experimentson movie and music RS only one third of the recom-mendation list was unfamiliar to users In contrast tothat content-based Diversity scores (divC) were ratherhigh in each domain Hence it can be concluded thatitems in a result set usually have a low rate of match-ing SKOS annotations In addition to content-basedscores user assessments of recommendation list Di-versity (divU) were also taken into account The me-dian level of agreement to the Diversity-related Lik-ert item stating that recommendations are diverse liedat 3 (ldquoneutralrdquo) in the multimedia domains and at 4(ldquoagreerdquo) in the travel domainIn summary the baseline approach achieved fairlygood performance results in TC1 The findings demon-strate that LOD-enabled recommendations can po-tentially enhance a userrsquos search experience Theyalso allow for subsequent tests of advanced retrievaltechniques which can be compared to this standardmethod

In the analysis of performance results from TC2(constraint-based recommendations) particular atten-tion was paid to participantsrsquo agreements with thestatement ldquoThe filter has improved the recommenda-tion results of the listrdquo Fig 8 shows the distribution ofdomain-wise agreement ratios On average they werefairly high Throughout the domains the vast majorityof participants either selected 5 (ldquostrongly agreerdquo) 4(ldquoagreerdquo) or 3 (ldquoneutralrdquo) Hence a filter often had apositive impact on results

The authors gathered responses to the questionon improvement for recommendation lists resultingfrom both the simple and the expressive filtering ap-proaches Hence they give clues about the generalperformance of constraint-based retrieval Howeversince this section of the study revealed the experimen-tal condition to participants (ie application of a fil-ter) answers to the Likert statement have to be inter-preted cautiously Results may be slightly biased asparticipants were able to guess the underlying agendaHowever these limitations do not exist for compar-isons between simple and expressive filters Since theweb application randomly assigned the list order par-ticipants did not know which approach they were as-sessing This enables us to take a closer look at the dif-ferences between the two filtering approaches It was

Table 12Performance results in TC1 (M = Mean S D = Standard DeviationN = No of participants)

Dimension (Metric)[Max Score]

Domain Results

M SD N

Accuracy (size) [10]

Travel 1000 000 103Movie 1000 000 50Music 1000 000 53Book 980 140 51

Accuracy (mrs) [1]

Travel 5649 000 103Movie 6225 1915 50Music 5570 010 53Book 5747 1555 50

Novelty (nv) [1]

Travel 060 034 101Movie 036 031 50Music 036 039 52Book 061 029 47

Diversity (divU) [5]

Travel 4 - 102Movie 3 - 50Music 3 - 53Book 3 - 50

Diversity (divC) [1]

Travel 080 016 103Movie 087 012 50Music 086 010 50Book 092 008 50

Usefulness [5]

Travel 334 085 103Movie 355 073 50Music 318 085 53Book 343 081 50

hypothesized that advanced filters improve recommen-dation quality as they may identify more relevant re-sources thereby overcoming problems of data qualityor data sparsity in LOD repositories Table 13 lists re-sponse rates of the two methods The rate measuresthe ratio of non-zero result sets among all result setsThroughout the domains recommendation lists result-ing from expressive filtering achieved higher responserates

Table 13Response Rates (TC2)

Domain Constr-based(Regular)

Constr-based(Expanded)

N

Travel 23 57 103

Movie 86 88 50

Music 72 75 53

Book 73 76 51

22 L Wenige et al Similarity-based Graph Queries

000

025

050

075

100

Digital Library Travel Movie Music Book

Domain

Per

cent

age

of A

gree

men

t

Degree of Agreement

1

2

3

4

5

The filter has improved the recommendations of the list

Fig 8 Participantsrsquo agreement with the statement that the filter has improved the result list (domain-wise) (TC2)

In addition to response rates user assessments andscores in the defined quality dimensions were com-pared Table 14 depicts the outcome of this analysisSignificantly higher scores are marked in bold figuresIn case statistical tests identified no meaningful varia-tions the results of the better performing approach areunderlined A-levels were adjusted with the false dis-covery rate (FDR) in order to decrease the probabilityof making a type I error due to multiple testing

According to mean scores expressive requestsgenerated more comprehensive recommendation lists(size) This finding is in line with the increased re-sponse rates shown in Table 13 However subse-quently conducted t-tests confirmed significant dif-ferences only for the travel experiment (t(102) =minus789 p lt 0001) In contrast precision scores werehigher in most of the domains (travel movie mu-sic) when the SKOSRec engine processed a simpleconstraint-based query However statistical tests didnot confirm any systematic differences The authorsapplied Wilcoxon tests for mrs scores and the remain-ing performance metrics because of the small samplesizes resulting from the low response rates of the sim-ple constraint-based approachRegarding Novelty none of the approaches was supe-rior In the Diversity dimension expressive filters gen-erated suggestions that were at least as topically di-versified as the results from simple constraint-basedretrieval Even though no significant differences were

identified between the two approaches increased meanscores (divU in the travel domain and divC in each do-main) indicate a slight superiority of expanded filterrequests in this quality dimension The same appliesto usefulness scores (mean values were higher in 3 outof 4 domains when expressive filtering was applied)However these statements are speculative at best be-cause statistical tests were not significantIn summary it can be concluded that expressive fil-ters improve recall potentially leading to a slight lossin precision scores (see Table 14) It may also be thecase that expanded user constraints diversify as wellas increase the usefulness of recommendation listsHowever these claims are unverified In contrast itis safe to say that expressive query patterns generateresults of at least the same quality as simple patternswhile increasing the number of relevant suggestionsThe few differences may be explained by the high Jac-card scores (JI) (see Table 15) of recommendation listsThroughout the domains result sets were much alikeGiven these similarities and the increases in list sizesfor expressive filters it is supposed that an expandedconstraint produces an enhanced version of the recom-mendation list resulting from simple filtering There-fore expanded filters should be the default setting in aconstraint-based retrieval context

In TC3 (ie evaluation of roll-up query patterns)the SKOSRec engine almost always generated non-empty recommendation lists (see Table 16) In the

L Wenige et al Similarity-based Graph Queries 23

Table 14Performance results in TC2 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Simple Expressive N

M SD M SD

Accuracy (size) [10]

Travel 117 270 452 462 103Movie 672 408 720 392 50Music 596 464 636 451 53Book 459 450 500 451 51

Accuracy (mrs) [1]

Travel 6673 2051 6534 2236 23Movie 6095 2298 6029 2279 43Music 6377 1605 6296 1644 34Book 5617 1818 5681 1681 37

Novelty (nv) [1]

Travel 048 041 061 049 21Movie 044 036 055 080 42Music 055 039 052 037 38Book 077 034 076 033 33

Diversity (divU) [5]

Travel 3 - 35 - 24Movie 4 - 4 - 43Music 4 - 4 - 38Book 3 - 3 - 37

Diversity (divC) [1]

Travel 068 018 072 021 19Movie 070 024 073 023 41Music 086 013 087 012 34Book 084 024 095 063 29

Usefulness [5]

Travel 333 093 350 077 24Movie 332 085 331 086 43Music 355 081 364 078 38Book 339 083 362 121 37

Table 15Jaccard Indices (TC2)

Domain Mean Jaccard Index (JI) SD N

Travel 057 047 25

Movie 071 038 43

Music 082 035 37

Book 092 026 35

travel and the book experiment the response rate washigher for aggregation-based queries whereas in themovie domain the engine more often provided recom-mendations for regular requests However these dif-ferences are only marginal and do not indicate a clearsuperiority of one method

Fig 9 depicts participants general agreement to Lik-ert statements concerning the Perceived Usefulness ofthe approaches in TC3 The diagram shows similarresults for regular as well as advanced requests Onaverage satisfaction scores reached levels which layslightly above neutral agreement A subsequent t-test

Table 16Response Rates (TC3)

Domain Regular Aggr-based N

Travel 99 100 103

Movie 96 92 50

Music 100 100 53

Book 96 100 51

confirmed no significant differences between the twomethods In fact the approach with the highest aver-age score of Perceived Usefulness was different in eachexperiment While mean values were higher for regu-lar requests in the studies on travel and music RS ad-vanced queries produced higher scores in the movieand book experiments

Content-based Diversity (divC) was the only met-ric with systematic differences Here scores went upfor advanced retrieval patterns in the travel (t(98) =minus550 p lt 0001) the music (t(52) = minus451 p lt

0001) and the book domain (t(47) = minus490 p lt

24 L Wenige et al Similarity-based Graph Queries

0

1

2

3

Travel Movie Music Book

Domain

Agr

eem

ent

Method

Regular

Rollminusup

Perceived Usefulness (Regular vs Rollminusup Queries)

Fig 9 Participantrsquos overall satisfaction with regular and roll-up recommendation retrieval (TC3)

0001) (see Table 17) This finding is not surprisinggiven that the aggregation-based approach explores thewider semantic network of each item in the profile

Hence it can be assumed that both approaches pro-duce recommendation lists of similar quality and canprovide benefits to potential consumers Thus it can-not be clearly stated which approach is superior be-cause neither of them outperformed the other in eachdomain While the regular retrieval strategy was bet-ter in terms of usefulness in the travel and the mu-sic domains the aggregation-based approach achievedhigher scores in the domains of movie and book rec-ommendationsThis conclusion is especially impressive given thelow mean Jaccard scores for recommendation lists re-sulting from regular and aggregation-based retrieval(see Table 18) The mean score never exceeded 01

throughout the domains It indicates a low concor-dance between result sets It is remarkable that par-ticipants perceived the recommendations as equallygood given the high dissimilarity of the lists Thusadvanced queries have an added-value as they provideadditional interesting results They can help to exploreLOD repositories in different ways than a regular re-trieval approach Therefore it is worthwhile to offeraggregation-based query patterns as an alternative re-trieval strategy when the regular approach has not pro-duced any helpful recommendations

In addition to TC3 user assessments for TC4 (cross-domain recommendations) were also evaluated How-ever since the authors conducted experiments for TC4only in the multimedia domains the samples were con-siderably smaller Nevertheless it is assumed that theamount of completed user sessions is only just enoughto conduct further statistical analyses Table 19 showsthe response rates for the two retrieval methods Cross-domain patterns had a higher success rate of generat-ing non-zero result sets throughout the domains Par-ticipants almost always received a recommendation forthis query type whereas in case a regular SPARQLquery was executed often they did not receive a singlesuggestion at all

We applied non-parametric tests for the perfor-mance metrics mrs nv divU divC and PerceivedUsefulness due to the low response rates of theSPARQL-based approach which led to a low num-ber of comparable data points accordingly Overall theFDR-adjusted pairwise Wilcoxon tests detected no sig-nificant differences except for user-based Diversityscores (divU) in the movie (Z = minus269 p lt 005) andthe book domain (Z = minus256 p lt 005) The resultsfor the content-based Diversity (divC) metric seem topoint in the same direction because of the high meanvalues for cross-domain queries in each domain Butthe significance of these results was not verified Themrs scores indicate a better performance of regular

L Wenige et al Similarity-based Graph Queries 25

Table 17Performance results in TC3 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular Aggr-based N

M SD M SD

Accuracy (size) [10]

Travel 990 099 942 207 103Movie 288 059 276 082 50Music 1000 000 1000 000 53Book 288 059 300 000 51

Accuracy (mrs) [1]

Travel 4821 2498 4493 2559 102Movie 5458 2288 5669 2542 44Music 5395 1831 5824 1619 52Book 5153 2390 4925 2230 49

Novelty (nv) [1]

Travel 043 039 038 040 98Movie 035 038 028 041 41Music 049 036 046 039 50Book 061 042 062 044 47

Diversity (divU) [5]

Travel 35 - 3 - 100Movie 4 - 3 - 43Music 4 - 3 - 50Book 3 - 3 - 48

Diversity (divC) [1]

Travel 079 017 089 013 99Movie 075 020 066 027 44Music 083 017 093 007 53Book 074 027 093 006 48

Usefulness [5]

Travel 336 077 322 074 101Movie 320 092 326 094 44Music 345 084 344 085 51Book 295 087 303 096 48

Table 18Jaccard Indices (TC3)

Domain Mean Jaccard Index (JI) SD N

Travel 007 011 101

Movie 003 007 44

Music 002 005 53

Book 002 006 50

Table 19Response rates (TC4)

Domain Regular (SPARQL) SKOSRec N

Movie 32 96 50

Music 62 98 53

Book 53 100 51

SPARQL queries but the within-subjects test did notconfirm this assumption statistically (see Table 20)

However the most important finding of the anal-ysis was that SKOSRec cross-domain queries gener-ated significantly more suggestions throughout the do-

mains (movie t(49) = minus1792 p lt 0001 mu-sic t(52) = minus2573 p lt 0001 book t(50) =minus1275 p lt 0001) (see Table 20) Fig 10 illustratesthis graphically

Whenever the SPARQL request provided recom-mendations which was the case in less than half ofthe user sessions (see Table 19) the SKOSRec sug-gestions were assessed to be of almost equal qualityas the SPARQL recommendations (see Table 20 Use-fulness) Hence when SPARQL querying did not pro-duce any results in more than the other half of the testcases the SKOSRec approach may still have provideduseful suggestions

Therefore it is reasonable to assume that the capa-bility of the SKOSRec engine to flexibly switch be-tween processing steps of similarity calculation andpattern matching can facilitate an improved explo-ration of LOD repositories Table 21 also shows thatthe items in the two sets often did not match as isindicated by low Jaccard indices This finding furtherdemonstrates that the execution of an additional step

26 L Wenige et al Similarity-based Graph Queries

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

00

25

50

75

100

Movie Music Book

Domain

No

of i

tem

s

Method

Regular

CrossminusDomain

Recommendation List Size

Fig 10 Mean recommendation list sizes for SPARQL and SKOSRec cross-domain requests (TC4)

L Wenige et al Similarity-based Graph Queries 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

of similarity calculation can help to retrieve other rel-evant items

5 Future Work amp Conclusion

This paper has presented the query-based SKOS-Rec engine that facilitates new types of recommen-dation requests for LOD repositories The evaluationshave shown that the SKOSRec system can often gener-ate relevant and useful suggestions when certain querytemplates are utilized In many cases the applicationof expressive or advanced query patterns (ie in thecontext of constraint-based retrieval or cross-domainrequests) helped to significantly improve recall valueswhile still providing the same level of quality in theremaining performance dimensions Additionally ad-vanced recommendation requests can be used to gener-ate diversified recommendation lists in case users arenot satisfied with the results of regular recommenda-tion requests (ie roll-up or cross-domain retrieval vsregular requests) While not being superior in each do-main and test case is at least safe to say that the novelrecommendation approaches of the SKOSRec engineconsiderably extend common retrieval methods and atleast provide an alternative search strategy when con-ventional methods fail to provide useful results Addi-tionally the increase in diversity and recall is in linewith findings from previous research on LOD-enabledRS [45]It is a strength of the developed approaches that theyfacilitate combinations of graph-based and similarity-based retrieval at different stages of the recommenda-tion workflow This feature extends general search ca-pabilities for semantic networks It requires future re-search to investigate whether in addition to SKOS fur-ther RDF vocabularies can be used for ad-hoc item-to-item similarity computation to increase the range ofpotential usage scenarios since not every LOD datasetcontains SKOS annotations [53] The application ofadditional properties could also boost the recommen-dation quality of some query patterns (eg such as thequery types used in TC3) In this context it also needsto be determined whether the representation of the user

profile in the SKOSRec query language can be substi-tuted with free-text expressions to enable search-likefunctionalities LOD researchers have already devel-oped engines that can process natural language queriesover RDF data [59] [34] [67] It will have to be inves-tigated how the SKOSRec engine profitably fits intothis landscape of existing retrieval toolsAnother open research question concerns the aspect ofinterfaces to other applications In its standard config-uration the system is not an end-user retrieval enginebut a backend application with which an administra-tor who has domain knowledge (eg of RDF vocab-ularies and data models of a particular LOD reposi-tory and usage scenario) creates the respective querypatterns Although the websites of the online experi-ments represent possible implementations of suitableend-user applications for the SKOSRec engine thereare still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selec-tion of LOD items for the profile was made possibleby previously storing the data in an index that couldthen be accessed and searched before issuing a recom-mendation query However this runs against the ideaof ad-hoc retrieval In this context future research willhave to determine whether it is feasible to generate anon-the-fly mapping from items to the respective LODresources In the case of a commercial application theengine needs to match items with the correspondingentries in the product catalog Even if it is not yet clearwhich UI components can make the best possible useof the SKOSRec enginersquos features the results from theuser experiments can give at least a few suggestionsThe following retrieval options should be the defaultsetting in an interface that contains a LOD-enabledRS component to assist users in finding interestingitems Since the regular approach of on-the-fly re-trieval achieved good results throughout the domainsthe display of simple recommendations generated frompreviously stated user preferences represents a suitablestarting point for retrieval On-the-fly queries could berefined by providing filter options Since the evalua-tions have shown that expressive constraints often leadto increased recall values the application of such afilter might be the best option for assisted retrievalAs in the preparations for the experiments appro-priate graph-based query patterns for the usage sce-nario in question would also have to be determinedby a domain expert before setting up the applicationThe same applies to advanced retrieval patterns whichshould be available to the end user to refine the querywhen both the simple as well as the constraint-based

28 L Wenige et al Similarity-based Graph Queries

approach have failed to provide relevant items Thissuggestion is made because roll-up queries were com-petitive with regular recommendations in the web-based experiments Therefore they represent a viablealternative retrieval strategy For some domains (egfor multimedia retrieval) it is also possible to usean optional selection field to enable the formulationof cross-domain requests Evaluations for this querytype suggest that users might receive interesting results(ie diversified and comprehensive recommendationlists) from such a requestIn summary the contributions of the paper are as fol-lows

ndash Development and implementation of novel re-trieval strategies in the SKOSRec system whichfacilitate personalized requests and leverage thepotential of knowledge graphs in order to addressthe research gap in state-of-the-art recommenda-tion methods with regard to customized queries

ndash Realization of web-based user experiments infour usage scenarios with the following out-comes

bull Successful demonstration that knowledge graphdata is applicable in the context of live recom-mendation scenarios

bull Creation of suitable query patterns that canbe applied to represent individual informationneeds in different application contexts The setof query patterns can be extended with the helpof the SKOSRec query language to meet therequirements of additional usage scenarios

bull Proof that the novel retrieval strategies can besuccessfully applied across various applicationcontexts and recommendation tasks and thatthey outperform conventional content-basedretrieval approaches especially in the perfor-mance dimensions of diversity and recall

The prototypical implementation and the evaluationof the SKOSRec engine in different usage scenarioshave demonstrated that the freely available knowledgesources on the LOD cloud can be successfully appliedto advance state-of-the-art methods in IR and recom-mender systems The proposed personalized and cus-tomized retrieval strategies can leverage the potentialof knowledge graphs by better matching informationneeds of users with the help of semantic search op-tions

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be com-prised of up to six elements of which the partsSimProjection and ItemPart are obligatory Henceusers need to state at least their preferencesand how many recommendations they wouldlike to receive Advanced retrieval patterns canbe formulated as well For instance similar re-source retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (Se-lectPart) of LOD resources As in SPARQL thequery can start with a prologue that abbreviatesIRIs with namespace declarations [21]

SelectPart The SelectPart part enables subqueryingwith recommendation results The surroundingquery is a SELECT query with a slightly simpli-fied WHERE clause (ie RecWhereClause) withwhich postfilter conditions can be formulatedIn this section users have the option to formu-late aggregation-based retrieval requests (Aggre-gation)

Aggregation This query part handles aggregation-based postfiltering Users specify the IRI (IRIref )resource for which similarity scores of sublevelentities are summarized Additionally it is statedhow the data should be aggregated (ie SUMMAX or AVG) as well as which variable (Var)represents the aggregation-based recommenda-tions in the postfilter section The compiler makessure that this variable is actually contained in theRecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refersto the list of generated suggestions It definesthe recommendation variable to which all simi-lar LOD resources are mapped In case pre- andpostfilter conditions are set in the other sectionsthe compiler checks whether the variable (Var)appears in these sections as well Otherwise therequired join operations cannot be executed Itis specified how many recommendations shouldbe displayed (Integer) and whether the requestwill be issued against a default repository or as across-repository request (ServiceIntegration)

ServiceIntegration This section indicates that a userintends to receive recommendations from a dif-ferent SPARQL endpoint than the default end-point that is stated in the standard configurationof the SKOSRec engine It specifies the targetSPARQL endpoint (IRIref ) from which a user

L Wenige et al Similarity-based Graph Queries 29

wants to receive recommendations Upon extrac-tion of SKOS annotations from the default sourceendpoint a subsequent request is sent to the spec-ified target endpoint

ItemPart This section of a SKOSRec query repre-sents a single preference in the user profile How-ever recommendation queries can contain a cou-ple of preference statements A preference is ei-ther expressed as a LOD resource (IRIref ) or asa variable which is bound to a preference query(VarPart) In the latter case the resources areobtained by matching the graph pattern in theVarPart with the triple statements in the reposi-tory The ItemPart statement can be additionallysupplemented with a concept-to-concept similar-ity threshold (Sim) that triggers concept expan-sion with the approach of flexible similarity de-tection which is extensively described in one ofour previous works [64]

VarPart The construct initiates preference queryingIndividual likings are expressed as a graph pat-tern in a RecWhereClause The variable (Var) isa placeholder for the LOD resources that will beused to set up the user profile The compiler hasto make sure that the specified variable occurs inthe RecWhereClause as well Otherwise the ex-traction of LOD resources cannot be carried outcorrectly

Sim The Sim part of a SKOSRec query denotes thethreshold of concept-to-concept scores for flexi-ble similarity detection Even though knowledge-based similarity values usually range between 0and 1 the specification allows a broader range ofdigits in the Decimal datatype to enable applica-tion of other similarity metrics in case they areneeded

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users canstate whether scores should be ldquolargerrdquo ldquolargerthanrdquo or ldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three dif-ferent sections of a SKOSRec query ie either inthe prefilter (ItemPart) the postfilter (SelectPart)or the preference filter section (VarPart) TheRecWhereClause closely resembles the ldquoWhere-Clauserdquo of the SPARQL 11 specification withall its subsequent parts [21] Hence it enablesdifferent combinations of graph pattern match-ing However the RecWhereClause of the SKOS-Rec language allows fewer pattern matching ex-pressions than the ldquoWhereClauserdquo of the regular

SPARQL syntax specification [21] For instanceit is neither permitted to formulate subqueries norto direct filter requests to more than one reposi-tory (see RecGraphPatternNotTriples for furtherexplanations) This modification has been madeto simplify similarity calculation After parsingthis section the compiler makes sure that recom-mendation preference or postfilter variables oc-cur at least once in the RecWhereClause to guar-antee that subsequent steps of the recommenda-tion workflow can be applied on existing LOD re-sources In case the RecWhereClause comprises aRecMinusGraphPattern it also has to be checkedthat the variable is not only contained in theMINUS part of the group

RecGroupGraphPattern A RecGroupGraphPatterncan contain one to many basic graph patternswhich can be differently combined according toRecGroupGraphPatternSub

RecGroupGraphPatternSub This section defines thegraph patterns that are applied to identify suit-able LOD resources User filters can be either ex-pressed as basic graph patterns (TriplesBlock) oras combinations of them (RecGraphPatternNot-Triples) The sequence of different kinds of pat-terns can be flexibly specified

RecGraphPatternNotTriples This query part closelyresembles the similar named ldquoGraphPatternNot-Triplesrdquo of the SPARQL 11 specification [21]The only difference to SPARQL is that the Rec-GraphPatternNotTriples section is a little morerestricted in terms of admissible graph patternsthan the ldquoGraphPatternNotTriplesrdquo section of theregular SPARQL specification For instance itdoes not allow to retrieve LOD resources otherthan from the default graph that is specified in theconfiguration of the SKOSRec engine since filterpatterns have to be applied to datasets that containSKOS annotations Hence they need to be speci-fied before runtime execution to ensure that sim-ilarity calculation can be carried out correctly Ifa user was able to formulate filter conditions thatreferred to different RDF graphs and endpointsaccordingly this would not be possible

RecGroupOrUnionGraphPattern This query sec-tion marks an alternative graph pattern in the styleof the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntaxspecification [21] However it neither allowsSPARQL-like subqueries nor federated queries

30 L Wenige et al Similarity-based Graph Queries

for filtered resources because this would compli-cate similarity calculation

RecOptionalGraphPattern In this part of a SKOS-RecQuery users are enabled to specify additionalgraph patterns which might extend the query so-lution but do not necessarily have to match thedata

RecMinusGraphPattern This section handles exclu-sion of LOD resources for cases when users wantto omit certain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

L Wenige et al Similarity-based Graph Queries 31

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM A

mobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige J Ruhland Flexible on-the-fly recommendationsfrom Linked Open Data repositories In International Confer-ence on Business Information Systems pp 43ndash54 2016

[63] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[64] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[65] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[66] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[67] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Future Work amp Conclusion
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 13: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge … · Semantic Web 0 (0) 1 1 IOS Press Similarity-based Knowledge Graph Queries for Recommendation Retrieval Lisa Wenige*,

L Wenige et al Similarity-based Graph Queries 13

scoreaggAVG(y) =

sumxisinmicro(xy) score(Pr x)

|x isin micro(x y)|(18)

4 Evaluation of the SKOS Recommender

41 Experimental Setup

The novel recommendation strategies of the SKOS-Rec engine were evaluated in the context of web-basedexperiments for the usage scenarios of travel destina-tion search and multimedia RS The online approachhelped to reach out to more participants with diversedemographic backgrounds which would not have beenpossible in a laboratory experiment Another positiveside-effect of the web-based setting was that partici-pants could test and rate recommendations without be-ing directly observed by an experimenter which mighthave caused biased results otherwise Hence the on-line approach softened common limitations of labora-tory studies [19]The experiments were carried out in the defined usagescenarios with metadata descriptions from DBpediaThe database containing the local mirror of DBpediaran on a virtual server Besides the triple store a webapplication was hosted on the same machine The web-site ran on an Apache Tomcat 7 application server andwas implemented with the help of Java servlet technol-ogy67

We conducted the online experiments between April2016 and January 2017 Upon setting up the web ap-plication a pretest was carried out to control for po-tential pitfalls such as misleading navigation Two testusers ran step by step through the web interface andprovided their feedback Because of their insights weslightly modified the interface (eg through rephras-ing user instructions) to increase the chances of suc-cessful completion In case the SKOSRec engine is ap-plied in a live setting the design of the user interfacewill have to be tested with more users Our experimen-tal series focused on algorithmic performance whichwas tested with numerous participantsSubjects were recruited through the clickworkercomplatform8 Participants received a compensation fee

6httptomcatapacheorg7httpwwworaclecomtechnetworkjavajavaeeservletindexhtml8httpswwwclickworkerde

of 100e or 150e for a completed survey 103 par-ticipants took part in the study on travel destinationsearch In the multimedia domains in total 154 sub-jects evaluated the SKOSRec engine For each partof the evaluation it was ensured that participants hadnot just run through the evaluation screen Wheneverthere appeared to be irregularities in the log files (egthe default settings of the screen were not changed)these assessments were excluded from further analy-sis The survey template started with a consent formIt informed participants about the context and the pro-cedure of the study After having filled in the consentform participants were navigated to the first part of theexperiment It collected data on subjectsrsquo demograph-ics and their consumption and search habits After thepreparation phase participants worked on the first testcase (TC1) in each usage scenario In this test case theperformance of the SKOSRec engine was assessed inthe simple execution mode of on-the-fly recommenda-tions (see Subsect 32) It was done to gather data onthe usefulness of suggestions resulting from a simplecontent-based approach (ie baseline method) withwhich more advanced recommendation strategies werecompared at a later stage In cases where users did notstate any items the application showed an error pageWhile setting up a profile required some effort on theside of the user it was a necessary step for the experi-ment Since we did not have access to the log files of alive application profile generation had to be simulatedThe process was facilitated by a specifically tailoredinterface (see Fig 2) where subjects could search foritems through an autocomplete field that applied AJAXtechnology9

It enabled users to type in keywords (eg the nameof a music act or the author of a book) for whichmatchings were instantaneously displayed withoutreloading the site Hence participants could quicklyfind and select their items of interest Whenever auser entered a query through the AJAX interfacethe system matched the query keywords against anApache SolrLucene search index10 The index wasbuilt with item metadata which had been extractedfrom DBpedia prior to setting up the website Theindex comprised the names and abstracts of itemsWhenever available information on thumbnail pic-tures were saved in the index to be ready for display inthe AJAX interface In case the Apache SolrLucene

9httpapijquerycomjqueryajax10httpluceneapacheorgsolr

14 L Wenige et al Similarity-based Graph Queries

Fig 2 User profile generation TC1 (music domain)

search engine found index keywords that matched theuser query metadata information for these items wereimmediately shown on the screen This informationwas displayed to help participants make an informeddecision on whether the AJAX result list containedsuitable items for profile generationAside from the AJAX feature the appeal of the in-terface was increased by a progress bar at the top ofthe webpage (see Fig 2) Progress bars are commonlyutilized tools for web surveys In conventional paper-based studies subjects usually see how far ahead theyare while in a web survey they cannot check theirprogress immediately This is especially true for stud-ies that apply a screen-by-screen navigation whichwas the chosen approach for the conducted web ex-periments Progress bars prevent users from tiring ofthe questionnaire and withdrawing from the study alto-gether by showing how close participants are to com-plete the experiment [14]Upon profile generation the SKOSRec engine calcu-lated on-the-fly recommendations for a simple query(similar to Q1 Sect 3) which were shown to usersin a separate screen Suggestions were displayed witha sufficient amount of information eg thumbnailslabels or a short description to help participants as-sess the quality of the recommendation In addition tothis information users could also access the respectiveWikipedia article by following the hyperlink on the la-bel Relevance sliders were used to assess the utility ofeach recommendation They are often applied in eval-uations of IR systems [51 54] Scores were handled aspoints on a 0 to 100 scale of the relevance slider (outerleft side lowest relevance outer right side highest rel-evance see Fig 3)

This approach facilitated the calculation of an accu-rate precision score and the application of tests withhigher statistical power than an ordinal 5-star scaleThe score called mean relevance (mrs) was measuredwith Equation 19

mrs =

ksumi=1

scorei

k(19)

In addition to the precision score the web applica-tion also determined the size of the recommendationlist produced by the engine as an indicator for recallHowever recommendation list size can only be uti-lized for this purpose when mrs scores reach a reason-able levelWhile accurate recommendations are useful becausethey build trust in a system [57] they only capture anarrow aspect of performance [36] For instance anonline retailer can tremendously profit from an RSthat provides both relevant and unobvious recommen-dations since it enables users to explore the prod-uct catalog more thoroughly On the other hand inthe case of an RS only suggesting popular productsit is likely that the user already knows them [3 23]A pure accuracy-based evaluation disregards these as-pects which is why other performance indicators suchas novelty were measured as well Thus the evalua-tion interface also contained a section where subjectshad to tick a radio button that indicated whether anitem was new In the backend of the web applicationnovelty was measured as the ratio of relevant items not

L Wenige et al Similarity-based Graph Queries 15

Fig 3 Recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

familiar to the user (|nov|) among the total numberof relevant items (|tp|) (Eq 20)

nv =|nov||tp|

(20)

An item was considered relevant when it achievedan mrs score of 50 or higher In addition to novelty anddiversity the web application also tracked the diversityof recommendation lists since a topically diversifiedresult set has been found to increase overall user satis-faction [31 66] In the RS literature the most widelyexplored diversification metrics are the ones that mea-

sure the similarity between each item pair in the rec-ommendation list The more the items are alike theless diverse is the result set [19] The study series ofthis paper applied the metric by Castells et al [8] Itdetermines item-to-item similarity values with the co-sine similarity measure Scores are then converted todistance values (Eq 21) and averaged for the recom-mendation list (Eq 22)

d(il im) = 1minus cos(il im) (21)

16 L Wenige et al Similarity-based Graph Queries

divC =2

k(k minus 1)

sumlltm

d(il im) (22)

The content-based diversity scores (divC) were cal-culated in the backend of the web application andsaved in a log file for each generated recommenda-tion list of the SKOSRec engine during runtime execu-tion In addition to the measurement of divC scores weasked participants about their opinions on recommen-dation list diversity directly Following the RS evalua-tion framework by Pu et al [44] two Likert items wereposed for this purpose

ndash The recommendations of list [] are diversendash The recommendations of list [] are similar to

each other

Besides the above listed algorithmic performancemetrics the authors gathered assessments from usersregarding the overall usefulness of the suggestionsThis approach is also in line with the empirically veri-fied evaluation framework by Pu et al [44] which pro-poses to measure the psychometric construct of Per-ceived Usefulness to get a comprehensive understand-ing of user opinions on recommendation results Thusparticipants were instructed to state their agreement topositive statements regarding the relevance of the sug-gestions The statements were adapated to the speci-ficities of each usage scenario Fig 4 shows an exam-ple collection of Likert questions for the domain ofmusic RS

After participants had provided the required infor-mation their answers were saved in a log file and theywere navigated to test case 2 (TC2) which evaluatedthe SKOSRec enginersquos performance for constraint-based queries (similar to Q3 Sect 3) We tried to findout if the engine was able to improve user satisfactionthrough filter conditions Even though IR systems al-ready successfully apply facet filters on result sets theexperiments still had to prove whether LOD-enabledconstraints have the same effect on the Perceived Use-fulness of recommendations For the comparisons ofnon-filtered and filtered suggestions we applied awithin-subjects design in which participants had torate two recommendation lists resulting from regular(non-filtered) and filtered requests in TC2 We favoredthis setup over a between-subjects design where sub-jects would have been assigned to only one result listWhile the between-subjects design more closely re-sembles a real-world setting since it does not require

participants to interact with more than one approachat a time differences among subjects (eg regard-ing demographics or topic expertise) can potentiallyhave a considerable impact on quality judgments [19]Hence when applying a between-subjects design onedoes not know whether differences in performance arecaused by the approaches or by the variability amongparticipants Therefore researchers need to conductbetween-subjects experiments with a sufficient numberof study subjects to level out these differences Within-subjects studies on the other hand require fewer par-ticipants while still achieving the same level of sta-tistical power since between-subject variability is notan issue [30 31] Beel et al have shown that evensmall variations in an RS test setting could cause sub-stantial differences in performance outcomes [6] Tokeep the extent of variations even lower comparisonsbetween non-filter and constraint-based recommenda-tions were based on the same profile for each user inTC2 Therefore the user profile from TC1 served as astarting point for filtered retrieval The web applicationshowed the recently generated profiles to users andasked them to provide an additional filter conditionFig 5 depicts an example web form for a constraint-based recommendation request in the music domain Itcomprises the user profile and an additional text fieldwhere users stated their constraint

Afterwards the engine applied this filter conditionon the set of potential recommendation results be-fore similarity calculation started For each usage sce-nario different types of constraints were available Re-garding suitable filter options the author sought toachieve a balance between assisting users in finding fil-ter conditions that would adequately represent their in-formation needs while keeping the variability amongusers as low as possible Therefore the specificitiesof each scenario and the availability of the respectivedata sources were determined For instance for themultimedia domains (movie music and books) it wasassumed that users would like to filter recommenda-tions according to the specific genre of the item Fortu-nately the DBpedia dataset contains this kind of infor-mation for the music domain In this domain the genrecan be retrieved through the properties dbogenreHowever in the movie and book domains DBpediadoes not provide sufficient information Many LOD re-sources which represent books were not assigned agenre In the movie domain a genre property is miss-ing entirely and genre-specific information can of-ten only be found in the subject categories describ-ing the movie Hence in contrast to the experiment on

L Wenige et al Similarity-based Graph Queries 17

Fig 4 Evaluation screen for recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

music recommendations participants in the domainsof books and films did not filter their recommenda-tion lists by genre On the other hand the propertydcsubject was applied as recommendation filter ineach multimedia domain Since the subject propertyoften links to many informative features of multimediaitems (eg release period geographic information orcontent information) the author decided that this fea-ture represents a powerful filter dimensionIn the experiment on travel destination search the webinterface offered two filters a subject and a location-specific filter The subject filter enabled users to statetheir preferences regarding the characteristics of a lo-cation (eg being a nature reserve) The location-specific filter on the other hand gave participants theoption to specify where the desired destination shouldbe located11

The second fundamental research issue of TC2 con-cerned the question whether expressive constraints canboost recommendation quality even further Hence

11httpdbpediaorg

apart from executing a constraint-based workflow witha simple filter (ie a direct attribute of an item)the SKOSRec engine generated suggestions with thehelp of an expressive filter as well This second pro-cedure while still applying the same user constraintas the simple filter utilized an expressive graph pat-tern to include additional LOD resources in the resultset For instance in case a subject filter was selectedresult set expansion was facilitated by exploring thewider semantic space of the DBpedia category graphthrough a property path declaration in the graph pat-tern (eg skosbroader[2]) In the travel exper-iment an additional expressive filter retrieved place-related information (dulhasLocation) for location-specific subproperty relations that are not material-ized in DBpedia Hence the expressive graph pat-tern expanded the filter such that additional proper-ties were matched (eg dbocountry dbocity ordbostate) while still applying the specified userconstraint on the set of LOD resources Upon exe-cution of constraint-based recommendation requestsusers received two recommendation lists (resultingfrom simple and expressive filtering accordingly) in a

18 L Wenige et al Similarity-based Graph Queries

Fig 5 Web form for a contraint-based recommendation request (music domain)

separate evaluation screen Participants evaluated re-sult sets in the same manner as in TC1 For each resultset the screen contained an additional Likert item ask-ing subjects to give their agreement to the statementldquoThe filter has improved the recommendation resultsof listrdquo The sequence of suggestions resulting fromthe different methods was randomly assigned in eachuser session to avoid order effectsAfter study subjects had completed TC2 they werenavigated to test case 3 (TC3) In the TC3 section ofthe travel experiment participants could choose be-tween three options for rollup query patterns (simi-lar to Q6 Sect 3) Users were able to obtain recom-mendations for travel destinations based on POIs theyhad visited during the stay in another destination Theycould either select a city (option 1) a region (option 2)or a country (option 3) as destination type Upon en-tity type selection the SKOSRec engine generated ap-propriate suggestions Fig 6 shows the web form fromthe travel experiment that facilitated the formulation ofadvanced rollup requests in TC3

When users had selected an entity type stated theirfavorite travel destination and three belonging POIs

the application generated a SKOSRec query from theseparameters A simple on-the-fly request was sent tothe engine as well to retrieve baseline recommenda-tions with which the suggestions resulting from theadvanced request were compared at a later stage ofthe experiment The simple query only contained theuserrsquos favorite travel destination and the selected en-tity type option thus omitting information on the POIsDuring the travel experiment the engine processedboth the simple and the advanced query and producedtwo recommendation lists As in the previous parts ofthe experiment participants assessed the quality of theresult sets which appeared in random order on thescreenThe multimedia experiments also contained a sec-tion on rollup retrieval (TC3) As the advanced travelqueries the multimedia requests aggregated similarityscores of sublevel entities through summation whichthe engine joined with a postfilter In addition to thesefeatures the pattern contained a preference query partThis section identified a set of preferred items througha single user statement Participants either specifiedtheir favorite directoractor (music domain) music act

L Wenige et al Similarity-based Graph Queries 19

Fig 6 Travel - User profile generation TC3

(music domain) or author (book domain) and receivedrecommendations based on the features of the workscreated by the artist (similar to Q5 Sect 3) TheSKOSRec engine generated recommendations basedon the creative works of the favored artist Partici-pants also received suggestions from a simple on-the-fly query which computed results according to theartistrsquos characteristics As in the travel experiment rec-ommendations from the advanced rollup request werecompared with the on-the-fly query without letting par-ticipants know which approach they were assessingand through random assignment of result set positionsAfter participants of the multimedia experiments hadevaluated the results of TC3 they were guided to testcase 4 (TC4) which evaluated cross-domain queriesThe experiment on travel RS ended with TC3 becauseno suitable graph patterns could be identified to fa-cilitate these kinds of suggestions for the travel usagescenario In the multimedia domain however the datafrom DBpedia was sufficient for this retrieval task Theentity type of the study (ie movie music act or book)defined the source domain based on which the enginegenerated suggestions for items from another multime-dia target domain (similar to Q7 Sect 3) Fig 7 de-picts the cross-domain web form of the music experi-ment

The web form assisted subjects in formulating a re-quest which then obtained movie suggestions basedon the favorite music act of the user Parameters fromusers were entered in the placeholders of the querypattern in the backend of the application We formu-lated the query templates in such a way that the queryengine matched any properties or graph patterns thatcould potentially connect two items from the specified

target and source domains For instance such match-ings can occur when a user states his favorite moviethat is written by a particular author In case the sameauthor has also written some books they might be ofinterest to the user as well Another example stemsfrom the movie domain Suppose in hisher profile aconsumer has declared a preference for a movie thatlinks to the corresponding soundtrack It might wellbe the case that the user likes the soundtrack and themusic acts that were involved in creating it just asmuch as she likes the movie The practice to distin-guish homonymous LOD resources through the prop-erty dbowikiPageDisambiguates is another in-teresting linking pattern from DBpedia that was ex-ploited for cross-domain requests in TC4 since it con-nects similar entities (ie the book edition of a certainmovie)Cross-domain SPARQL queries (similar to Q8 Sect3) were posed as baseline requests in this test caseWhile the SPARQL query only matched cross-domainrelations for a single LOD resource the SKOSRec re-quest expanded the search space by considering linksfrom more than one item As in the previous test casesTC4 ended with an evaluation screen where users as-sessed the SPARQL-based as well as the SKOSRecsuggestions in random order without knowing whichrecommendation list belonged to which method Table10 gives an overview of the test cases and the meth-ods that were applied in the experiments in each usagescenario

42 Results

In total 257 people took part in the studyThe gath-ered samples from the experiments contained answers

20 L Wenige et al Similarity-based Graph Queries

Fig 7 Music - User profile generation TC4 (page 9)

Table 10Test Cases in the web experiments

Test Case Evaluation Purpose Query Type Travel Movie Music BookTC1 On-the-fly Recommendations (Baseline) Q1 X X X X

TC2 Constraint-based Recommendations Q3 X X X X

TC3 Advanced Queries (Rollup) Q5 Q6 X X X X

TC4 Advanced Queries (Cross-Domain) Q7 x X X X

from both genders with a slight overrepresentation ofmale participants (537) The prevailing number ofstudy subjects was between 20 and 40 years of age(658) The evaluation of the demographics sectionalso revealed that the majority of participants alreadyperceives the search in their domain of interest as ei-ther ldquomanageablerdquo (385) or even as ldquoeasyrdquo or ldquoveryeasyrdquo (440)Prior to analyzing the performance results of the novelretrieval approaches of the SKOSRec engine it waschecked whether the respective Likert items for Per-ceived Usefulness and user-based diversity (divU)reached acceptable reliability levels For this purposeCronbachrsquos α values were determined Table 11 showsthe outcome of this analysis for the Perceived Useful-ness construct Given the acceptable values through-out the domains (Cronbachsrsquo s α gt 07) it could betreated as an interval-scaled variable in the statisticaltests

Table 11Cronbachrsquos α values for the concept of Perceived Usefulness

Domain Cronbach αTravel 07002

Movie 08579

Music 08515

Book 08949

Besides usefulness user-based diversity scores werealso measured by multiple Likert items However

Cronbachrsquos α values for this concept did not reach ac-ceptable levels In order to avoid that valuable user as-sessments remained unused agreements to the state-ment bdquoThe recommendations of list [] are diverse ldquowere taken into account as an ordinal variable (divU)Based on these results subsequent tests could be car-ried out The first part of the analysis concerned theperformance of the on-the-fly retrieval method (base-line) of the SKOSRec engine in TC1 The outcomeof this analysis confirms that the on-the-fly approachrepresents a viable recommendation strategy Table 12shows the mean (M)12 standard deviation (S D) andnumber of participants (N) for each performance di-mension and metric It can be seen that users assessedthe enginersquos performance in almost each quality di-mension to be above mediocrity Aside from the posi-tive user evaluations with regard to Perceived Uselful-ness related quality aspects such as Accuracy Noveltyand Diversity received good assessments as well Forinstance each domain mean relevance score (mrs) washigher than 50 score points It means that participantswere often convinced of the relevance of the resultsGiven these positive user assessments with regard toitem relevance it seems to be justified to consider themean of result list sizes as an indicator of recall Theengine was able to generate the required number ofrecommendations in almost each test case (see Table

12In case of the user-based diversity score (divU) M denotes themedian because of the ordinal scale of the variable

L Wenige et al Similarity-based Graph Queries 21

12) The on-the-fly retrieval approach was also testedwith regard to the Novelty dimension Table 12 showsthe mean novelty scores (nv) for the baseline recom-mendation approach Novelty scores indicate that themajority of items in a result list were deemed to be newin the travel and book RS whereas in the experimentson movie and music RS only one third of the recom-mendation list was unfamiliar to users In contrast tothat content-based Diversity scores (divC) were ratherhigh in each domain Hence it can be concluded thatitems in a result set usually have a low rate of match-ing SKOS annotations In addition to content-basedscores user assessments of recommendation list Di-versity (divU) were also taken into account The me-dian level of agreement to the Diversity-related Lik-ert item stating that recommendations are diverse liedat 3 (ldquoneutralrdquo) in the multimedia domains and at 4(ldquoagreerdquo) in the travel domainIn summary the baseline approach achieved fairlygood performance results in TC1 The findings demon-strate that LOD-enabled recommendations can po-tentially enhance a userrsquos search experience Theyalso allow for subsequent tests of advanced retrievaltechniques which can be compared to this standardmethod

In the analysis of performance results from TC2(constraint-based recommendations) particular atten-tion was paid to participantsrsquo agreements with thestatement ldquoThe filter has improved the recommenda-tion results of the listrdquo Fig 8 shows the distribution ofdomain-wise agreement ratios On average they werefairly high Throughout the domains the vast majorityof participants either selected 5 (ldquostrongly agreerdquo) 4(ldquoagreerdquo) or 3 (ldquoneutralrdquo) Hence a filter often had apositive impact on results

The authors gathered responses to the questionon improvement for recommendation lists resultingfrom both the simple and the expressive filtering ap-proaches Hence they give clues about the generalperformance of constraint-based retrieval Howeversince this section of the study revealed the experimen-tal condition to participants (ie application of a fil-ter) answers to the Likert statement have to be inter-preted cautiously Results may be slightly biased asparticipants were able to guess the underlying agendaHowever these limitations do not exist for compar-isons between simple and expressive filters Since theweb application randomly assigned the list order par-ticipants did not know which approach they were as-sessing This enables us to take a closer look at the dif-ferences between the two filtering approaches It was

Table 12Performance results in TC1 (M = Mean S D = Standard DeviationN = No of participants)

Dimension (Metric)[Max Score]

Domain Results

M SD N

Accuracy (size) [10]

Travel 1000 000 103Movie 1000 000 50Music 1000 000 53Book 980 140 51

Accuracy (mrs) [1]

Travel 5649 000 103Movie 6225 1915 50Music 5570 010 53Book 5747 1555 50

Novelty (nv) [1]

Travel 060 034 101Movie 036 031 50Music 036 039 52Book 061 029 47

Diversity (divU) [5]

Travel 4 - 102Movie 3 - 50Music 3 - 53Book 3 - 50

Diversity (divC) [1]

Travel 080 016 103Movie 087 012 50Music 086 010 50Book 092 008 50

Usefulness [5]

Travel 334 085 103Movie 355 073 50Music 318 085 53Book 343 081 50

hypothesized that advanced filters improve recommen-dation quality as they may identify more relevant re-sources thereby overcoming problems of data qualityor data sparsity in LOD repositories Table 13 lists re-sponse rates of the two methods The rate measuresthe ratio of non-zero result sets among all result setsThroughout the domains recommendation lists result-ing from expressive filtering achieved higher responserates

Table 13Response Rates (TC2)

Domain Constr-based(Regular)

Constr-based(Expanded)

N

Travel 23 57 103

Movie 86 88 50

Music 72 75 53

Book 73 76 51

22 L Wenige et al Similarity-based Graph Queries

000

025

050

075

100

Digital Library Travel Movie Music Book

Domain

Per

cent

age

of A

gree

men

t

Degree of Agreement

1

2

3

4

5

The filter has improved the recommendations of the list

Fig 8 Participantsrsquo agreement with the statement that the filter has improved the result list (domain-wise) (TC2)

In addition to response rates user assessments andscores in the defined quality dimensions were com-pared Table 14 depicts the outcome of this analysisSignificantly higher scores are marked in bold figuresIn case statistical tests identified no meaningful varia-tions the results of the better performing approach areunderlined A-levels were adjusted with the false dis-covery rate (FDR) in order to decrease the probabilityof making a type I error due to multiple testing

According to mean scores expressive requestsgenerated more comprehensive recommendation lists(size) This finding is in line with the increased re-sponse rates shown in Table 13 However subse-quently conducted t-tests confirmed significant dif-ferences only for the travel experiment (t(102) =minus789 p lt 0001) In contrast precision scores werehigher in most of the domains (travel movie mu-sic) when the SKOSRec engine processed a simpleconstraint-based query However statistical tests didnot confirm any systematic differences The authorsapplied Wilcoxon tests for mrs scores and the remain-ing performance metrics because of the small samplesizes resulting from the low response rates of the sim-ple constraint-based approachRegarding Novelty none of the approaches was supe-rior In the Diversity dimension expressive filters gen-erated suggestions that were at least as topically di-versified as the results from simple constraint-basedretrieval Even though no significant differences were

identified between the two approaches increased meanscores (divU in the travel domain and divC in each do-main) indicate a slight superiority of expanded filterrequests in this quality dimension The same appliesto usefulness scores (mean values were higher in 3 outof 4 domains when expressive filtering was applied)However these statements are speculative at best be-cause statistical tests were not significantIn summary it can be concluded that expressive fil-ters improve recall potentially leading to a slight lossin precision scores (see Table 14) It may also be thecase that expanded user constraints diversify as wellas increase the usefulness of recommendation listsHowever these claims are unverified In contrast itis safe to say that expressive query patterns generateresults of at least the same quality as simple patternswhile increasing the number of relevant suggestionsThe few differences may be explained by the high Jac-card scores (JI) (see Table 15) of recommendation listsThroughout the domains result sets were much alikeGiven these similarities and the increases in list sizesfor expressive filters it is supposed that an expandedconstraint produces an enhanced version of the recom-mendation list resulting from simple filtering There-fore expanded filters should be the default setting in aconstraint-based retrieval context

In TC3 (ie evaluation of roll-up query patterns)the SKOSRec engine almost always generated non-empty recommendation lists (see Table 16) In the

L Wenige et al Similarity-based Graph Queries 23

Table 14Performance results in TC2 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Simple Expressive N

M SD M SD

Accuracy (size) [10]

Travel 117 270 452 462 103Movie 672 408 720 392 50Music 596 464 636 451 53Book 459 450 500 451 51

Accuracy (mrs) [1]

Travel 6673 2051 6534 2236 23Movie 6095 2298 6029 2279 43Music 6377 1605 6296 1644 34Book 5617 1818 5681 1681 37

Novelty (nv) [1]

Travel 048 041 061 049 21Movie 044 036 055 080 42Music 055 039 052 037 38Book 077 034 076 033 33

Diversity (divU) [5]

Travel 3 - 35 - 24Movie 4 - 4 - 43Music 4 - 4 - 38Book 3 - 3 - 37

Diversity (divC) [1]

Travel 068 018 072 021 19Movie 070 024 073 023 41Music 086 013 087 012 34Book 084 024 095 063 29

Usefulness [5]

Travel 333 093 350 077 24Movie 332 085 331 086 43Music 355 081 364 078 38Book 339 083 362 121 37

Table 15Jaccard Indices (TC2)

Domain Mean Jaccard Index (JI) SD N

Travel 057 047 25

Movie 071 038 43

Music 082 035 37

Book 092 026 35

travel and the book experiment the response rate washigher for aggregation-based queries whereas in themovie domain the engine more often provided recom-mendations for regular requests However these dif-ferences are only marginal and do not indicate a clearsuperiority of one method

Fig 9 depicts participants general agreement to Lik-ert statements concerning the Perceived Usefulness ofthe approaches in TC3 The diagram shows similarresults for regular as well as advanced requests Onaverage satisfaction scores reached levels which layslightly above neutral agreement A subsequent t-test

Table 16Response Rates (TC3)

Domain Regular Aggr-based N

Travel 99 100 103

Movie 96 92 50

Music 100 100 53

Book 96 100 51

confirmed no significant differences between the twomethods In fact the approach with the highest aver-age score of Perceived Usefulness was different in eachexperiment While mean values were higher for regu-lar requests in the studies on travel and music RS ad-vanced queries produced higher scores in the movieand book experiments

Content-based Diversity (divC) was the only met-ric with systematic differences Here scores went upfor advanced retrieval patterns in the travel (t(98) =minus550 p lt 0001) the music (t(52) = minus451 p lt

0001) and the book domain (t(47) = minus490 p lt

24 L Wenige et al Similarity-based Graph Queries

0

1

2

3

Travel Movie Music Book

Domain

Agr

eem

ent

Method

Regular

Rollminusup

Perceived Usefulness (Regular vs Rollminusup Queries)

Fig 9 Participantrsquos overall satisfaction with regular and roll-up recommendation retrieval (TC3)

0001) (see Table 17) This finding is not surprisinggiven that the aggregation-based approach explores thewider semantic network of each item in the profile

Hence it can be assumed that both approaches pro-duce recommendation lists of similar quality and canprovide benefits to potential consumers Thus it can-not be clearly stated which approach is superior be-cause neither of them outperformed the other in eachdomain While the regular retrieval strategy was bet-ter in terms of usefulness in the travel and the mu-sic domains the aggregation-based approach achievedhigher scores in the domains of movie and book rec-ommendationsThis conclusion is especially impressive given thelow mean Jaccard scores for recommendation lists re-sulting from regular and aggregation-based retrieval(see Table 18) The mean score never exceeded 01

throughout the domains It indicates a low concor-dance between result sets It is remarkable that par-ticipants perceived the recommendations as equallygood given the high dissimilarity of the lists Thusadvanced queries have an added-value as they provideadditional interesting results They can help to exploreLOD repositories in different ways than a regular re-trieval approach Therefore it is worthwhile to offeraggregation-based query patterns as an alternative re-trieval strategy when the regular approach has not pro-duced any helpful recommendations

In addition to TC3 user assessments for TC4 (cross-domain recommendations) were also evaluated How-ever since the authors conducted experiments for TC4only in the multimedia domains the samples were con-siderably smaller Nevertheless it is assumed that theamount of completed user sessions is only just enoughto conduct further statistical analyses Table 19 showsthe response rates for the two retrieval methods Cross-domain patterns had a higher success rate of generat-ing non-zero result sets throughout the domains Par-ticipants almost always received a recommendation forthis query type whereas in case a regular SPARQLquery was executed often they did not receive a singlesuggestion at all

We applied non-parametric tests for the perfor-mance metrics mrs nv divU divC and PerceivedUsefulness due to the low response rates of theSPARQL-based approach which led to a low num-ber of comparable data points accordingly Overall theFDR-adjusted pairwise Wilcoxon tests detected no sig-nificant differences except for user-based Diversityscores (divU) in the movie (Z = minus269 p lt 005) andthe book domain (Z = minus256 p lt 005) The resultsfor the content-based Diversity (divC) metric seem topoint in the same direction because of the high meanvalues for cross-domain queries in each domain Butthe significance of these results was not verified Themrs scores indicate a better performance of regular

L Wenige et al Similarity-based Graph Queries 25

Table 17Performance results in TC3 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular Aggr-based N

M SD M SD

Accuracy (size) [10]

Travel 990 099 942 207 103Movie 288 059 276 082 50Music 1000 000 1000 000 53Book 288 059 300 000 51

Accuracy (mrs) [1]

Travel 4821 2498 4493 2559 102Movie 5458 2288 5669 2542 44Music 5395 1831 5824 1619 52Book 5153 2390 4925 2230 49

Novelty (nv) [1]

Travel 043 039 038 040 98Movie 035 038 028 041 41Music 049 036 046 039 50Book 061 042 062 044 47

Diversity (divU) [5]

Travel 35 - 3 - 100Movie 4 - 3 - 43Music 4 - 3 - 50Book 3 - 3 - 48

Diversity (divC) [1]

Travel 079 017 089 013 99Movie 075 020 066 027 44Music 083 017 093 007 53Book 074 027 093 006 48

Usefulness [5]

Travel 336 077 322 074 101Movie 320 092 326 094 44Music 345 084 344 085 51Book 295 087 303 096 48

Table 18Jaccard Indices (TC3)

Domain Mean Jaccard Index (JI) SD N

Travel 007 011 101

Movie 003 007 44

Music 002 005 53

Book 002 006 50

Table 19Response rates (TC4)

Domain Regular (SPARQL) SKOSRec N

Movie 32 96 50

Music 62 98 53

Book 53 100 51

SPARQL queries but the within-subjects test did notconfirm this assumption statistically (see Table 20)

However the most important finding of the anal-ysis was that SKOSRec cross-domain queries gener-ated significantly more suggestions throughout the do-

mains (movie t(49) = minus1792 p lt 0001 mu-sic t(52) = minus2573 p lt 0001 book t(50) =minus1275 p lt 0001) (see Table 20) Fig 10 illustratesthis graphically

Whenever the SPARQL request provided recom-mendations which was the case in less than half ofthe user sessions (see Table 19) the SKOSRec sug-gestions were assessed to be of almost equal qualityas the SPARQL recommendations (see Table 20 Use-fulness) Hence when SPARQL querying did not pro-duce any results in more than the other half of the testcases the SKOSRec approach may still have provideduseful suggestions

Therefore it is reasonable to assume that the capa-bility of the SKOSRec engine to flexibly switch be-tween processing steps of similarity calculation andpattern matching can facilitate an improved explo-ration of LOD repositories Table 21 also shows thatthe items in the two sets often did not match as isindicated by low Jaccard indices This finding furtherdemonstrates that the execution of an additional step

26 L Wenige et al Similarity-based Graph Queries

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

00

25

50

75

100

Movie Music Book

Domain

No

of i

tem

s

Method

Regular

CrossminusDomain

Recommendation List Size

Fig 10 Mean recommendation list sizes for SPARQL and SKOSRec cross-domain requests (TC4)

L Wenige et al Similarity-based Graph Queries 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

of similarity calculation can help to retrieve other rel-evant items

5 Future Work amp Conclusion

This paper has presented the query-based SKOS-Rec engine that facilitates new types of recommen-dation requests for LOD repositories The evaluationshave shown that the SKOSRec system can often gener-ate relevant and useful suggestions when certain querytemplates are utilized In many cases the applicationof expressive or advanced query patterns (ie in thecontext of constraint-based retrieval or cross-domainrequests) helped to significantly improve recall valueswhile still providing the same level of quality in theremaining performance dimensions Additionally ad-vanced recommendation requests can be used to gener-ate diversified recommendation lists in case users arenot satisfied with the results of regular recommenda-tion requests (ie roll-up or cross-domain retrieval vsregular requests) While not being superior in each do-main and test case is at least safe to say that the novelrecommendation approaches of the SKOSRec engineconsiderably extend common retrieval methods and atleast provide an alternative search strategy when con-ventional methods fail to provide useful results Addi-tionally the increase in diversity and recall is in linewith findings from previous research on LOD-enabledRS [45]It is a strength of the developed approaches that theyfacilitate combinations of graph-based and similarity-based retrieval at different stages of the recommenda-tion workflow This feature extends general search ca-pabilities for semantic networks It requires future re-search to investigate whether in addition to SKOS fur-ther RDF vocabularies can be used for ad-hoc item-to-item similarity computation to increase the range ofpotential usage scenarios since not every LOD datasetcontains SKOS annotations [53] The application ofadditional properties could also boost the recommen-dation quality of some query patterns (eg such as thequery types used in TC3) In this context it also needsto be determined whether the representation of the user

profile in the SKOSRec query language can be substi-tuted with free-text expressions to enable search-likefunctionalities LOD researchers have already devel-oped engines that can process natural language queriesover RDF data [59] [34] [67] It will have to be inves-tigated how the SKOSRec engine profitably fits intothis landscape of existing retrieval toolsAnother open research question concerns the aspect ofinterfaces to other applications In its standard config-uration the system is not an end-user retrieval enginebut a backend application with which an administra-tor who has domain knowledge (eg of RDF vocab-ularies and data models of a particular LOD reposi-tory and usage scenario) creates the respective querypatterns Although the websites of the online experi-ments represent possible implementations of suitableend-user applications for the SKOSRec engine thereare still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selec-tion of LOD items for the profile was made possibleby previously storing the data in an index that couldthen be accessed and searched before issuing a recom-mendation query However this runs against the ideaof ad-hoc retrieval In this context future research willhave to determine whether it is feasible to generate anon-the-fly mapping from items to the respective LODresources In the case of a commercial application theengine needs to match items with the correspondingentries in the product catalog Even if it is not yet clearwhich UI components can make the best possible useof the SKOSRec enginersquos features the results from theuser experiments can give at least a few suggestionsThe following retrieval options should be the defaultsetting in an interface that contains a LOD-enabledRS component to assist users in finding interestingitems Since the regular approach of on-the-fly re-trieval achieved good results throughout the domainsthe display of simple recommendations generated frompreviously stated user preferences represents a suitablestarting point for retrieval On-the-fly queries could berefined by providing filter options Since the evalua-tions have shown that expressive constraints often leadto increased recall values the application of such afilter might be the best option for assisted retrievalAs in the preparations for the experiments appro-priate graph-based query patterns for the usage sce-nario in question would also have to be determinedby a domain expert before setting up the applicationThe same applies to advanced retrieval patterns whichshould be available to the end user to refine the querywhen both the simple as well as the constraint-based

28 L Wenige et al Similarity-based Graph Queries

approach have failed to provide relevant items Thissuggestion is made because roll-up queries were com-petitive with regular recommendations in the web-based experiments Therefore they represent a viablealternative retrieval strategy For some domains (egfor multimedia retrieval) it is also possible to usean optional selection field to enable the formulationof cross-domain requests Evaluations for this querytype suggest that users might receive interesting results(ie diversified and comprehensive recommendationlists) from such a requestIn summary the contributions of the paper are as fol-lows

ndash Development and implementation of novel re-trieval strategies in the SKOSRec system whichfacilitate personalized requests and leverage thepotential of knowledge graphs in order to addressthe research gap in state-of-the-art recommenda-tion methods with regard to customized queries

ndash Realization of web-based user experiments infour usage scenarios with the following out-comes

bull Successful demonstration that knowledge graphdata is applicable in the context of live recom-mendation scenarios

bull Creation of suitable query patterns that canbe applied to represent individual informationneeds in different application contexts The setof query patterns can be extended with the helpof the SKOSRec query language to meet therequirements of additional usage scenarios

bull Proof that the novel retrieval strategies can besuccessfully applied across various applicationcontexts and recommendation tasks and thatthey outperform conventional content-basedretrieval approaches especially in the perfor-mance dimensions of diversity and recall

The prototypical implementation and the evaluationof the SKOSRec engine in different usage scenarioshave demonstrated that the freely available knowledgesources on the LOD cloud can be successfully appliedto advance state-of-the-art methods in IR and recom-mender systems The proposed personalized and cus-tomized retrieval strategies can leverage the potentialof knowledge graphs by better matching informationneeds of users with the help of semantic search op-tions

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be com-prised of up to six elements of which the partsSimProjection and ItemPart are obligatory Henceusers need to state at least their preferencesand how many recommendations they wouldlike to receive Advanced retrieval patterns canbe formulated as well For instance similar re-source retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (Se-lectPart) of LOD resources As in SPARQL thequery can start with a prologue that abbreviatesIRIs with namespace declarations [21]

SelectPart The SelectPart part enables subqueryingwith recommendation results The surroundingquery is a SELECT query with a slightly simpli-fied WHERE clause (ie RecWhereClause) withwhich postfilter conditions can be formulatedIn this section users have the option to formu-late aggregation-based retrieval requests (Aggre-gation)

Aggregation This query part handles aggregation-based postfiltering Users specify the IRI (IRIref )resource for which similarity scores of sublevelentities are summarized Additionally it is statedhow the data should be aggregated (ie SUMMAX or AVG) as well as which variable (Var)represents the aggregation-based recommenda-tions in the postfilter section The compiler makessure that this variable is actually contained in theRecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refersto the list of generated suggestions It definesthe recommendation variable to which all simi-lar LOD resources are mapped In case pre- andpostfilter conditions are set in the other sectionsthe compiler checks whether the variable (Var)appears in these sections as well Otherwise therequired join operations cannot be executed Itis specified how many recommendations shouldbe displayed (Integer) and whether the requestwill be issued against a default repository or as across-repository request (ServiceIntegration)

ServiceIntegration This section indicates that a userintends to receive recommendations from a dif-ferent SPARQL endpoint than the default end-point that is stated in the standard configurationof the SKOSRec engine It specifies the targetSPARQL endpoint (IRIref ) from which a user

L Wenige et al Similarity-based Graph Queries 29

wants to receive recommendations Upon extrac-tion of SKOS annotations from the default sourceendpoint a subsequent request is sent to the spec-ified target endpoint

ItemPart This section of a SKOSRec query repre-sents a single preference in the user profile How-ever recommendation queries can contain a cou-ple of preference statements A preference is ei-ther expressed as a LOD resource (IRIref ) or asa variable which is bound to a preference query(VarPart) In the latter case the resources areobtained by matching the graph pattern in theVarPart with the triple statements in the reposi-tory The ItemPart statement can be additionallysupplemented with a concept-to-concept similar-ity threshold (Sim) that triggers concept expan-sion with the approach of flexible similarity de-tection which is extensively described in one ofour previous works [64]

VarPart The construct initiates preference queryingIndividual likings are expressed as a graph pat-tern in a RecWhereClause The variable (Var) isa placeholder for the LOD resources that will beused to set up the user profile The compiler hasto make sure that the specified variable occurs inthe RecWhereClause as well Otherwise the ex-traction of LOD resources cannot be carried outcorrectly

Sim The Sim part of a SKOSRec query denotes thethreshold of concept-to-concept scores for flexi-ble similarity detection Even though knowledge-based similarity values usually range between 0and 1 the specification allows a broader range ofdigits in the Decimal datatype to enable applica-tion of other similarity metrics in case they areneeded

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users canstate whether scores should be ldquolargerrdquo ldquolargerthanrdquo or ldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three dif-ferent sections of a SKOSRec query ie either inthe prefilter (ItemPart) the postfilter (SelectPart)or the preference filter section (VarPart) TheRecWhereClause closely resembles the ldquoWhere-Clauserdquo of the SPARQL 11 specification withall its subsequent parts [21] Hence it enablesdifferent combinations of graph pattern match-ing However the RecWhereClause of the SKOS-Rec language allows fewer pattern matching ex-pressions than the ldquoWhereClauserdquo of the regular

SPARQL syntax specification [21] For instanceit is neither permitted to formulate subqueries norto direct filter requests to more than one reposi-tory (see RecGraphPatternNotTriples for furtherexplanations) This modification has been madeto simplify similarity calculation After parsingthis section the compiler makes sure that recom-mendation preference or postfilter variables oc-cur at least once in the RecWhereClause to guar-antee that subsequent steps of the recommenda-tion workflow can be applied on existing LOD re-sources In case the RecWhereClause comprises aRecMinusGraphPattern it also has to be checkedthat the variable is not only contained in theMINUS part of the group

RecGroupGraphPattern A RecGroupGraphPatterncan contain one to many basic graph patternswhich can be differently combined according toRecGroupGraphPatternSub

RecGroupGraphPatternSub This section defines thegraph patterns that are applied to identify suit-able LOD resources User filters can be either ex-pressed as basic graph patterns (TriplesBlock) oras combinations of them (RecGraphPatternNot-Triples) The sequence of different kinds of pat-terns can be flexibly specified

RecGraphPatternNotTriples This query part closelyresembles the similar named ldquoGraphPatternNot-Triplesrdquo of the SPARQL 11 specification [21]The only difference to SPARQL is that the Rec-GraphPatternNotTriples section is a little morerestricted in terms of admissible graph patternsthan the ldquoGraphPatternNotTriplesrdquo section of theregular SPARQL specification For instance itdoes not allow to retrieve LOD resources otherthan from the default graph that is specified in theconfiguration of the SKOSRec engine since filterpatterns have to be applied to datasets that containSKOS annotations Hence they need to be speci-fied before runtime execution to ensure that sim-ilarity calculation can be carried out correctly Ifa user was able to formulate filter conditions thatreferred to different RDF graphs and endpointsaccordingly this would not be possible

RecGroupOrUnionGraphPattern This query sec-tion marks an alternative graph pattern in the styleof the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntaxspecification [21] However it neither allowsSPARQL-like subqueries nor federated queries

30 L Wenige et al Similarity-based Graph Queries

for filtered resources because this would compli-cate similarity calculation

RecOptionalGraphPattern In this part of a SKOS-RecQuery users are enabled to specify additionalgraph patterns which might extend the query so-lution but do not necessarily have to match thedata

RecMinusGraphPattern This section handles exclu-sion of LOD resources for cases when users wantto omit certain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

L Wenige et al Similarity-based Graph Queries 31

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM A

mobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige J Ruhland Flexible on-the-fly recommendationsfrom Linked Open Data repositories In International Confer-ence on Business Information Systems pp 43ndash54 2016

[63] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[64] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[65] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[66] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[67] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Future Work amp Conclusion
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 14: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge … · Semantic Web 0 (0) 1 1 IOS Press Similarity-based Knowledge Graph Queries for Recommendation Retrieval Lisa Wenige*,

14 L Wenige et al Similarity-based Graph Queries

Fig 2 User profile generation TC1 (music domain)

search engine found index keywords that matched theuser query metadata information for these items wereimmediately shown on the screen This informationwas displayed to help participants make an informeddecision on whether the AJAX result list containedsuitable items for profile generationAside from the AJAX feature the appeal of the in-terface was increased by a progress bar at the top ofthe webpage (see Fig 2) Progress bars are commonlyutilized tools for web surveys In conventional paper-based studies subjects usually see how far ahead theyare while in a web survey they cannot check theirprogress immediately This is especially true for stud-ies that apply a screen-by-screen navigation whichwas the chosen approach for the conducted web ex-periments Progress bars prevent users from tiring ofthe questionnaire and withdrawing from the study alto-gether by showing how close participants are to com-plete the experiment [14]Upon profile generation the SKOSRec engine calcu-lated on-the-fly recommendations for a simple query(similar to Q1 Sect 3) which were shown to usersin a separate screen Suggestions were displayed witha sufficient amount of information eg thumbnailslabels or a short description to help participants as-sess the quality of the recommendation In addition tothis information users could also access the respectiveWikipedia article by following the hyperlink on the la-bel Relevance sliders were used to assess the utility ofeach recommendation They are often applied in eval-uations of IR systems [51 54] Scores were handled aspoints on a 0 to 100 scale of the relevance slider (outerleft side lowest relevance outer right side highest rel-evance see Fig 3)

This approach facilitated the calculation of an accu-rate precision score and the application of tests withhigher statistical power than an ordinal 5-star scaleThe score called mean relevance (mrs) was measuredwith Equation 19

mrs =

ksumi=1

scorei

k(19)

In addition to the precision score the web applica-tion also determined the size of the recommendationlist produced by the engine as an indicator for recallHowever recommendation list size can only be uti-lized for this purpose when mrs scores reach a reason-able levelWhile accurate recommendations are useful becausethey build trust in a system [57] they only capture anarrow aspect of performance [36] For instance anonline retailer can tremendously profit from an RSthat provides both relevant and unobvious recommen-dations since it enables users to explore the prod-uct catalog more thoroughly On the other hand inthe case of an RS only suggesting popular productsit is likely that the user already knows them [3 23]A pure accuracy-based evaluation disregards these as-pects which is why other performance indicators suchas novelty were measured as well Thus the evalua-tion interface also contained a section where subjectshad to tick a radio button that indicated whether anitem was new In the backend of the web applicationnovelty was measured as the ratio of relevant items not

L Wenige et al Similarity-based Graph Queries 15

Fig 3 Recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

familiar to the user (|nov|) among the total numberof relevant items (|tp|) (Eq 20)

nv =|nov||tp|

(20)

An item was considered relevant when it achievedan mrs score of 50 or higher In addition to novelty anddiversity the web application also tracked the diversityof recommendation lists since a topically diversifiedresult set has been found to increase overall user satis-faction [31 66] In the RS literature the most widelyexplored diversification metrics are the ones that mea-

sure the similarity between each item pair in the rec-ommendation list The more the items are alike theless diverse is the result set [19] The study series ofthis paper applied the metric by Castells et al [8] Itdetermines item-to-item similarity values with the co-sine similarity measure Scores are then converted todistance values (Eq 21) and averaged for the recom-mendation list (Eq 22)

d(il im) = 1minus cos(il im) (21)

16 L Wenige et al Similarity-based Graph Queries

divC =2

k(k minus 1)

sumlltm

d(il im) (22)

The content-based diversity scores (divC) were cal-culated in the backend of the web application andsaved in a log file for each generated recommenda-tion list of the SKOSRec engine during runtime execu-tion In addition to the measurement of divC scores weasked participants about their opinions on recommen-dation list diversity directly Following the RS evalua-tion framework by Pu et al [44] two Likert items wereposed for this purpose

ndash The recommendations of list [] are diversendash The recommendations of list [] are similar to

each other

Besides the above listed algorithmic performancemetrics the authors gathered assessments from usersregarding the overall usefulness of the suggestionsThis approach is also in line with the empirically veri-fied evaluation framework by Pu et al [44] which pro-poses to measure the psychometric construct of Per-ceived Usefulness to get a comprehensive understand-ing of user opinions on recommendation results Thusparticipants were instructed to state their agreement topositive statements regarding the relevance of the sug-gestions The statements were adapated to the speci-ficities of each usage scenario Fig 4 shows an exam-ple collection of Likert questions for the domain ofmusic RS

After participants had provided the required infor-mation their answers were saved in a log file and theywere navigated to test case 2 (TC2) which evaluatedthe SKOSRec enginersquos performance for constraint-based queries (similar to Q3 Sect 3) We tried to findout if the engine was able to improve user satisfactionthrough filter conditions Even though IR systems al-ready successfully apply facet filters on result sets theexperiments still had to prove whether LOD-enabledconstraints have the same effect on the Perceived Use-fulness of recommendations For the comparisons ofnon-filtered and filtered suggestions we applied awithin-subjects design in which participants had torate two recommendation lists resulting from regular(non-filtered) and filtered requests in TC2 We favoredthis setup over a between-subjects design where sub-jects would have been assigned to only one result listWhile the between-subjects design more closely re-sembles a real-world setting since it does not require

participants to interact with more than one approachat a time differences among subjects (eg regard-ing demographics or topic expertise) can potentiallyhave a considerable impact on quality judgments [19]Hence when applying a between-subjects design onedoes not know whether differences in performance arecaused by the approaches or by the variability amongparticipants Therefore researchers need to conductbetween-subjects experiments with a sufficient numberof study subjects to level out these differences Within-subjects studies on the other hand require fewer par-ticipants while still achieving the same level of sta-tistical power since between-subject variability is notan issue [30 31] Beel et al have shown that evensmall variations in an RS test setting could cause sub-stantial differences in performance outcomes [6] Tokeep the extent of variations even lower comparisonsbetween non-filter and constraint-based recommenda-tions were based on the same profile for each user inTC2 Therefore the user profile from TC1 served as astarting point for filtered retrieval The web applicationshowed the recently generated profiles to users andasked them to provide an additional filter conditionFig 5 depicts an example web form for a constraint-based recommendation request in the music domain Itcomprises the user profile and an additional text fieldwhere users stated their constraint

Afterwards the engine applied this filter conditionon the set of potential recommendation results be-fore similarity calculation started For each usage sce-nario different types of constraints were available Re-garding suitable filter options the author sought toachieve a balance between assisting users in finding fil-ter conditions that would adequately represent their in-formation needs while keeping the variability amongusers as low as possible Therefore the specificitiesof each scenario and the availability of the respectivedata sources were determined For instance for themultimedia domains (movie music and books) it wasassumed that users would like to filter recommenda-tions according to the specific genre of the item Fortu-nately the DBpedia dataset contains this kind of infor-mation for the music domain In this domain the genrecan be retrieved through the properties dbogenreHowever in the movie and book domains DBpediadoes not provide sufficient information Many LOD re-sources which represent books were not assigned agenre In the movie domain a genre property is miss-ing entirely and genre-specific information can of-ten only be found in the subject categories describ-ing the movie Hence in contrast to the experiment on

L Wenige et al Similarity-based Graph Queries 17

Fig 4 Evaluation screen for recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

music recommendations participants in the domainsof books and films did not filter their recommenda-tion lists by genre On the other hand the propertydcsubject was applied as recommendation filter ineach multimedia domain Since the subject propertyoften links to many informative features of multimediaitems (eg release period geographic information orcontent information) the author decided that this fea-ture represents a powerful filter dimensionIn the experiment on travel destination search the webinterface offered two filters a subject and a location-specific filter The subject filter enabled users to statetheir preferences regarding the characteristics of a lo-cation (eg being a nature reserve) The location-specific filter on the other hand gave participants theoption to specify where the desired destination shouldbe located11

The second fundamental research issue of TC2 con-cerned the question whether expressive constraints canboost recommendation quality even further Hence

11httpdbpediaorg

apart from executing a constraint-based workflow witha simple filter (ie a direct attribute of an item)the SKOSRec engine generated suggestions with thehelp of an expressive filter as well This second pro-cedure while still applying the same user constraintas the simple filter utilized an expressive graph pat-tern to include additional LOD resources in the resultset For instance in case a subject filter was selectedresult set expansion was facilitated by exploring thewider semantic space of the DBpedia category graphthrough a property path declaration in the graph pat-tern (eg skosbroader[2]) In the travel exper-iment an additional expressive filter retrieved place-related information (dulhasLocation) for location-specific subproperty relations that are not material-ized in DBpedia Hence the expressive graph pat-tern expanded the filter such that additional proper-ties were matched (eg dbocountry dbocity ordbostate) while still applying the specified userconstraint on the set of LOD resources Upon exe-cution of constraint-based recommendation requestsusers received two recommendation lists (resultingfrom simple and expressive filtering accordingly) in a

18 L Wenige et al Similarity-based Graph Queries

Fig 5 Web form for a contraint-based recommendation request (music domain)

separate evaluation screen Participants evaluated re-sult sets in the same manner as in TC1 For each resultset the screen contained an additional Likert item ask-ing subjects to give their agreement to the statementldquoThe filter has improved the recommendation resultsof listrdquo The sequence of suggestions resulting fromthe different methods was randomly assigned in eachuser session to avoid order effectsAfter study subjects had completed TC2 they werenavigated to test case 3 (TC3) In the TC3 section ofthe travel experiment participants could choose be-tween three options for rollup query patterns (simi-lar to Q6 Sect 3) Users were able to obtain recom-mendations for travel destinations based on POIs theyhad visited during the stay in another destination Theycould either select a city (option 1) a region (option 2)or a country (option 3) as destination type Upon en-tity type selection the SKOSRec engine generated ap-propriate suggestions Fig 6 shows the web form fromthe travel experiment that facilitated the formulation ofadvanced rollup requests in TC3

When users had selected an entity type stated theirfavorite travel destination and three belonging POIs

the application generated a SKOSRec query from theseparameters A simple on-the-fly request was sent tothe engine as well to retrieve baseline recommenda-tions with which the suggestions resulting from theadvanced request were compared at a later stage ofthe experiment The simple query only contained theuserrsquos favorite travel destination and the selected en-tity type option thus omitting information on the POIsDuring the travel experiment the engine processedboth the simple and the advanced query and producedtwo recommendation lists As in the previous parts ofthe experiment participants assessed the quality of theresult sets which appeared in random order on thescreenThe multimedia experiments also contained a sec-tion on rollup retrieval (TC3) As the advanced travelqueries the multimedia requests aggregated similarityscores of sublevel entities through summation whichthe engine joined with a postfilter In addition to thesefeatures the pattern contained a preference query partThis section identified a set of preferred items througha single user statement Participants either specifiedtheir favorite directoractor (music domain) music act

L Wenige et al Similarity-based Graph Queries 19

Fig 6 Travel - User profile generation TC3

(music domain) or author (book domain) and receivedrecommendations based on the features of the workscreated by the artist (similar to Q5 Sect 3) TheSKOSRec engine generated recommendations basedon the creative works of the favored artist Partici-pants also received suggestions from a simple on-the-fly query which computed results according to theartistrsquos characteristics As in the travel experiment rec-ommendations from the advanced rollup request werecompared with the on-the-fly query without letting par-ticipants know which approach they were assessingand through random assignment of result set positionsAfter participants of the multimedia experiments hadevaluated the results of TC3 they were guided to testcase 4 (TC4) which evaluated cross-domain queriesThe experiment on travel RS ended with TC3 becauseno suitable graph patterns could be identified to fa-cilitate these kinds of suggestions for the travel usagescenario In the multimedia domain however the datafrom DBpedia was sufficient for this retrieval task Theentity type of the study (ie movie music act or book)defined the source domain based on which the enginegenerated suggestions for items from another multime-dia target domain (similar to Q7 Sect 3) Fig 7 de-picts the cross-domain web form of the music experi-ment

The web form assisted subjects in formulating a re-quest which then obtained movie suggestions basedon the favorite music act of the user Parameters fromusers were entered in the placeholders of the querypattern in the backend of the application We formu-lated the query templates in such a way that the queryengine matched any properties or graph patterns thatcould potentially connect two items from the specified

target and source domains For instance such match-ings can occur when a user states his favorite moviethat is written by a particular author In case the sameauthor has also written some books they might be ofinterest to the user as well Another example stemsfrom the movie domain Suppose in hisher profile aconsumer has declared a preference for a movie thatlinks to the corresponding soundtrack It might wellbe the case that the user likes the soundtrack and themusic acts that were involved in creating it just asmuch as she likes the movie The practice to distin-guish homonymous LOD resources through the prop-erty dbowikiPageDisambiguates is another in-teresting linking pattern from DBpedia that was ex-ploited for cross-domain requests in TC4 since it con-nects similar entities (ie the book edition of a certainmovie)Cross-domain SPARQL queries (similar to Q8 Sect3) were posed as baseline requests in this test caseWhile the SPARQL query only matched cross-domainrelations for a single LOD resource the SKOSRec re-quest expanded the search space by considering linksfrom more than one item As in the previous test casesTC4 ended with an evaluation screen where users as-sessed the SPARQL-based as well as the SKOSRecsuggestions in random order without knowing whichrecommendation list belonged to which method Table10 gives an overview of the test cases and the meth-ods that were applied in the experiments in each usagescenario

42 Results

In total 257 people took part in the studyThe gath-ered samples from the experiments contained answers

20 L Wenige et al Similarity-based Graph Queries

Fig 7 Music - User profile generation TC4 (page 9)

Table 10Test Cases in the web experiments

Test Case Evaluation Purpose Query Type Travel Movie Music BookTC1 On-the-fly Recommendations (Baseline) Q1 X X X X

TC2 Constraint-based Recommendations Q3 X X X X

TC3 Advanced Queries (Rollup) Q5 Q6 X X X X

TC4 Advanced Queries (Cross-Domain) Q7 x X X X

from both genders with a slight overrepresentation ofmale participants (537) The prevailing number ofstudy subjects was between 20 and 40 years of age(658) The evaluation of the demographics sectionalso revealed that the majority of participants alreadyperceives the search in their domain of interest as ei-ther ldquomanageablerdquo (385) or even as ldquoeasyrdquo or ldquoveryeasyrdquo (440)Prior to analyzing the performance results of the novelretrieval approaches of the SKOSRec engine it waschecked whether the respective Likert items for Per-ceived Usefulness and user-based diversity (divU)reached acceptable reliability levels For this purposeCronbachrsquos α values were determined Table 11 showsthe outcome of this analysis for the Perceived Useful-ness construct Given the acceptable values through-out the domains (Cronbachsrsquo s α gt 07) it could betreated as an interval-scaled variable in the statisticaltests

Table 11Cronbachrsquos α values for the concept of Perceived Usefulness

Domain Cronbach αTravel 07002

Movie 08579

Music 08515

Book 08949

Besides usefulness user-based diversity scores werealso measured by multiple Likert items However

Cronbachrsquos α values for this concept did not reach ac-ceptable levels In order to avoid that valuable user as-sessments remained unused agreements to the state-ment bdquoThe recommendations of list [] are diverse ldquowere taken into account as an ordinal variable (divU)Based on these results subsequent tests could be car-ried out The first part of the analysis concerned theperformance of the on-the-fly retrieval method (base-line) of the SKOSRec engine in TC1 The outcomeof this analysis confirms that the on-the-fly approachrepresents a viable recommendation strategy Table 12shows the mean (M)12 standard deviation (S D) andnumber of participants (N) for each performance di-mension and metric It can be seen that users assessedthe enginersquos performance in almost each quality di-mension to be above mediocrity Aside from the posi-tive user evaluations with regard to Perceived Uselful-ness related quality aspects such as Accuracy Noveltyand Diversity received good assessments as well Forinstance each domain mean relevance score (mrs) washigher than 50 score points It means that participantswere often convinced of the relevance of the resultsGiven these positive user assessments with regard toitem relevance it seems to be justified to consider themean of result list sizes as an indicator of recall Theengine was able to generate the required number ofrecommendations in almost each test case (see Table

12In case of the user-based diversity score (divU) M denotes themedian because of the ordinal scale of the variable

L Wenige et al Similarity-based Graph Queries 21

12) The on-the-fly retrieval approach was also testedwith regard to the Novelty dimension Table 12 showsthe mean novelty scores (nv) for the baseline recom-mendation approach Novelty scores indicate that themajority of items in a result list were deemed to be newin the travel and book RS whereas in the experimentson movie and music RS only one third of the recom-mendation list was unfamiliar to users In contrast tothat content-based Diversity scores (divC) were ratherhigh in each domain Hence it can be concluded thatitems in a result set usually have a low rate of match-ing SKOS annotations In addition to content-basedscores user assessments of recommendation list Di-versity (divU) were also taken into account The me-dian level of agreement to the Diversity-related Lik-ert item stating that recommendations are diverse liedat 3 (ldquoneutralrdquo) in the multimedia domains and at 4(ldquoagreerdquo) in the travel domainIn summary the baseline approach achieved fairlygood performance results in TC1 The findings demon-strate that LOD-enabled recommendations can po-tentially enhance a userrsquos search experience Theyalso allow for subsequent tests of advanced retrievaltechniques which can be compared to this standardmethod

In the analysis of performance results from TC2(constraint-based recommendations) particular atten-tion was paid to participantsrsquo agreements with thestatement ldquoThe filter has improved the recommenda-tion results of the listrdquo Fig 8 shows the distribution ofdomain-wise agreement ratios On average they werefairly high Throughout the domains the vast majorityof participants either selected 5 (ldquostrongly agreerdquo) 4(ldquoagreerdquo) or 3 (ldquoneutralrdquo) Hence a filter often had apositive impact on results

The authors gathered responses to the questionon improvement for recommendation lists resultingfrom both the simple and the expressive filtering ap-proaches Hence they give clues about the generalperformance of constraint-based retrieval Howeversince this section of the study revealed the experimen-tal condition to participants (ie application of a fil-ter) answers to the Likert statement have to be inter-preted cautiously Results may be slightly biased asparticipants were able to guess the underlying agendaHowever these limitations do not exist for compar-isons between simple and expressive filters Since theweb application randomly assigned the list order par-ticipants did not know which approach they were as-sessing This enables us to take a closer look at the dif-ferences between the two filtering approaches It was

Table 12Performance results in TC1 (M = Mean S D = Standard DeviationN = No of participants)

Dimension (Metric)[Max Score]

Domain Results

M SD N

Accuracy (size) [10]

Travel 1000 000 103Movie 1000 000 50Music 1000 000 53Book 980 140 51

Accuracy (mrs) [1]

Travel 5649 000 103Movie 6225 1915 50Music 5570 010 53Book 5747 1555 50

Novelty (nv) [1]

Travel 060 034 101Movie 036 031 50Music 036 039 52Book 061 029 47

Diversity (divU) [5]

Travel 4 - 102Movie 3 - 50Music 3 - 53Book 3 - 50

Diversity (divC) [1]

Travel 080 016 103Movie 087 012 50Music 086 010 50Book 092 008 50

Usefulness [5]

Travel 334 085 103Movie 355 073 50Music 318 085 53Book 343 081 50

hypothesized that advanced filters improve recommen-dation quality as they may identify more relevant re-sources thereby overcoming problems of data qualityor data sparsity in LOD repositories Table 13 lists re-sponse rates of the two methods The rate measuresthe ratio of non-zero result sets among all result setsThroughout the domains recommendation lists result-ing from expressive filtering achieved higher responserates

Table 13Response Rates (TC2)

Domain Constr-based(Regular)

Constr-based(Expanded)

N

Travel 23 57 103

Movie 86 88 50

Music 72 75 53

Book 73 76 51

22 L Wenige et al Similarity-based Graph Queries

000

025

050

075

100

Digital Library Travel Movie Music Book

Domain

Per

cent

age

of A

gree

men

t

Degree of Agreement

1

2

3

4

5

The filter has improved the recommendations of the list

Fig 8 Participantsrsquo agreement with the statement that the filter has improved the result list (domain-wise) (TC2)

In addition to response rates user assessments andscores in the defined quality dimensions were com-pared Table 14 depicts the outcome of this analysisSignificantly higher scores are marked in bold figuresIn case statistical tests identified no meaningful varia-tions the results of the better performing approach areunderlined A-levels were adjusted with the false dis-covery rate (FDR) in order to decrease the probabilityof making a type I error due to multiple testing

According to mean scores expressive requestsgenerated more comprehensive recommendation lists(size) This finding is in line with the increased re-sponse rates shown in Table 13 However subse-quently conducted t-tests confirmed significant dif-ferences only for the travel experiment (t(102) =minus789 p lt 0001) In contrast precision scores werehigher in most of the domains (travel movie mu-sic) when the SKOSRec engine processed a simpleconstraint-based query However statistical tests didnot confirm any systematic differences The authorsapplied Wilcoxon tests for mrs scores and the remain-ing performance metrics because of the small samplesizes resulting from the low response rates of the sim-ple constraint-based approachRegarding Novelty none of the approaches was supe-rior In the Diversity dimension expressive filters gen-erated suggestions that were at least as topically di-versified as the results from simple constraint-basedretrieval Even though no significant differences were

identified between the two approaches increased meanscores (divU in the travel domain and divC in each do-main) indicate a slight superiority of expanded filterrequests in this quality dimension The same appliesto usefulness scores (mean values were higher in 3 outof 4 domains when expressive filtering was applied)However these statements are speculative at best be-cause statistical tests were not significantIn summary it can be concluded that expressive fil-ters improve recall potentially leading to a slight lossin precision scores (see Table 14) It may also be thecase that expanded user constraints diversify as wellas increase the usefulness of recommendation listsHowever these claims are unverified In contrast itis safe to say that expressive query patterns generateresults of at least the same quality as simple patternswhile increasing the number of relevant suggestionsThe few differences may be explained by the high Jac-card scores (JI) (see Table 15) of recommendation listsThroughout the domains result sets were much alikeGiven these similarities and the increases in list sizesfor expressive filters it is supposed that an expandedconstraint produces an enhanced version of the recom-mendation list resulting from simple filtering There-fore expanded filters should be the default setting in aconstraint-based retrieval context

In TC3 (ie evaluation of roll-up query patterns)the SKOSRec engine almost always generated non-empty recommendation lists (see Table 16) In the

L Wenige et al Similarity-based Graph Queries 23

Table 14Performance results in TC2 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Simple Expressive N

M SD M SD

Accuracy (size) [10]

Travel 117 270 452 462 103Movie 672 408 720 392 50Music 596 464 636 451 53Book 459 450 500 451 51

Accuracy (mrs) [1]

Travel 6673 2051 6534 2236 23Movie 6095 2298 6029 2279 43Music 6377 1605 6296 1644 34Book 5617 1818 5681 1681 37

Novelty (nv) [1]

Travel 048 041 061 049 21Movie 044 036 055 080 42Music 055 039 052 037 38Book 077 034 076 033 33

Diversity (divU) [5]

Travel 3 - 35 - 24Movie 4 - 4 - 43Music 4 - 4 - 38Book 3 - 3 - 37

Diversity (divC) [1]

Travel 068 018 072 021 19Movie 070 024 073 023 41Music 086 013 087 012 34Book 084 024 095 063 29

Usefulness [5]

Travel 333 093 350 077 24Movie 332 085 331 086 43Music 355 081 364 078 38Book 339 083 362 121 37

Table 15Jaccard Indices (TC2)

Domain Mean Jaccard Index (JI) SD N

Travel 057 047 25

Movie 071 038 43

Music 082 035 37

Book 092 026 35

travel and the book experiment the response rate washigher for aggregation-based queries whereas in themovie domain the engine more often provided recom-mendations for regular requests However these dif-ferences are only marginal and do not indicate a clearsuperiority of one method

Fig 9 depicts participants general agreement to Lik-ert statements concerning the Perceived Usefulness ofthe approaches in TC3 The diagram shows similarresults for regular as well as advanced requests Onaverage satisfaction scores reached levels which layslightly above neutral agreement A subsequent t-test

Table 16Response Rates (TC3)

Domain Regular Aggr-based N

Travel 99 100 103

Movie 96 92 50

Music 100 100 53

Book 96 100 51

confirmed no significant differences between the twomethods In fact the approach with the highest aver-age score of Perceived Usefulness was different in eachexperiment While mean values were higher for regu-lar requests in the studies on travel and music RS ad-vanced queries produced higher scores in the movieand book experiments

Content-based Diversity (divC) was the only met-ric with systematic differences Here scores went upfor advanced retrieval patterns in the travel (t(98) =minus550 p lt 0001) the music (t(52) = minus451 p lt

0001) and the book domain (t(47) = minus490 p lt

24 L Wenige et al Similarity-based Graph Queries

0

1

2

3

Travel Movie Music Book

Domain

Agr

eem

ent

Method

Regular

Rollminusup

Perceived Usefulness (Regular vs Rollminusup Queries)

Fig 9 Participantrsquos overall satisfaction with regular and roll-up recommendation retrieval (TC3)

0001) (see Table 17) This finding is not surprisinggiven that the aggregation-based approach explores thewider semantic network of each item in the profile

Hence it can be assumed that both approaches pro-duce recommendation lists of similar quality and canprovide benefits to potential consumers Thus it can-not be clearly stated which approach is superior be-cause neither of them outperformed the other in eachdomain While the regular retrieval strategy was bet-ter in terms of usefulness in the travel and the mu-sic domains the aggregation-based approach achievedhigher scores in the domains of movie and book rec-ommendationsThis conclusion is especially impressive given thelow mean Jaccard scores for recommendation lists re-sulting from regular and aggregation-based retrieval(see Table 18) The mean score never exceeded 01

throughout the domains It indicates a low concor-dance between result sets It is remarkable that par-ticipants perceived the recommendations as equallygood given the high dissimilarity of the lists Thusadvanced queries have an added-value as they provideadditional interesting results They can help to exploreLOD repositories in different ways than a regular re-trieval approach Therefore it is worthwhile to offeraggregation-based query patterns as an alternative re-trieval strategy when the regular approach has not pro-duced any helpful recommendations

In addition to TC3 user assessments for TC4 (cross-domain recommendations) were also evaluated How-ever since the authors conducted experiments for TC4only in the multimedia domains the samples were con-siderably smaller Nevertheless it is assumed that theamount of completed user sessions is only just enoughto conduct further statistical analyses Table 19 showsthe response rates for the two retrieval methods Cross-domain patterns had a higher success rate of generat-ing non-zero result sets throughout the domains Par-ticipants almost always received a recommendation forthis query type whereas in case a regular SPARQLquery was executed often they did not receive a singlesuggestion at all

We applied non-parametric tests for the perfor-mance metrics mrs nv divU divC and PerceivedUsefulness due to the low response rates of theSPARQL-based approach which led to a low num-ber of comparable data points accordingly Overall theFDR-adjusted pairwise Wilcoxon tests detected no sig-nificant differences except for user-based Diversityscores (divU) in the movie (Z = minus269 p lt 005) andthe book domain (Z = minus256 p lt 005) The resultsfor the content-based Diversity (divC) metric seem topoint in the same direction because of the high meanvalues for cross-domain queries in each domain Butthe significance of these results was not verified Themrs scores indicate a better performance of regular

L Wenige et al Similarity-based Graph Queries 25

Table 17Performance results in TC3 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular Aggr-based N

M SD M SD

Accuracy (size) [10]

Travel 990 099 942 207 103Movie 288 059 276 082 50Music 1000 000 1000 000 53Book 288 059 300 000 51

Accuracy (mrs) [1]

Travel 4821 2498 4493 2559 102Movie 5458 2288 5669 2542 44Music 5395 1831 5824 1619 52Book 5153 2390 4925 2230 49

Novelty (nv) [1]

Travel 043 039 038 040 98Movie 035 038 028 041 41Music 049 036 046 039 50Book 061 042 062 044 47

Diversity (divU) [5]

Travel 35 - 3 - 100Movie 4 - 3 - 43Music 4 - 3 - 50Book 3 - 3 - 48

Diversity (divC) [1]

Travel 079 017 089 013 99Movie 075 020 066 027 44Music 083 017 093 007 53Book 074 027 093 006 48

Usefulness [5]

Travel 336 077 322 074 101Movie 320 092 326 094 44Music 345 084 344 085 51Book 295 087 303 096 48

Table 18Jaccard Indices (TC3)

Domain Mean Jaccard Index (JI) SD N

Travel 007 011 101

Movie 003 007 44

Music 002 005 53

Book 002 006 50

Table 19Response rates (TC4)

Domain Regular (SPARQL) SKOSRec N

Movie 32 96 50

Music 62 98 53

Book 53 100 51

SPARQL queries but the within-subjects test did notconfirm this assumption statistically (see Table 20)

However the most important finding of the anal-ysis was that SKOSRec cross-domain queries gener-ated significantly more suggestions throughout the do-

mains (movie t(49) = minus1792 p lt 0001 mu-sic t(52) = minus2573 p lt 0001 book t(50) =minus1275 p lt 0001) (see Table 20) Fig 10 illustratesthis graphically

Whenever the SPARQL request provided recom-mendations which was the case in less than half ofthe user sessions (see Table 19) the SKOSRec sug-gestions were assessed to be of almost equal qualityas the SPARQL recommendations (see Table 20 Use-fulness) Hence when SPARQL querying did not pro-duce any results in more than the other half of the testcases the SKOSRec approach may still have provideduseful suggestions

Therefore it is reasonable to assume that the capa-bility of the SKOSRec engine to flexibly switch be-tween processing steps of similarity calculation andpattern matching can facilitate an improved explo-ration of LOD repositories Table 21 also shows thatthe items in the two sets often did not match as isindicated by low Jaccard indices This finding furtherdemonstrates that the execution of an additional step

26 L Wenige et al Similarity-based Graph Queries

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

00

25

50

75

100

Movie Music Book

Domain

No

of i

tem

s

Method

Regular

CrossminusDomain

Recommendation List Size

Fig 10 Mean recommendation list sizes for SPARQL and SKOSRec cross-domain requests (TC4)

L Wenige et al Similarity-based Graph Queries 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

of similarity calculation can help to retrieve other rel-evant items

5 Future Work amp Conclusion

This paper has presented the query-based SKOS-Rec engine that facilitates new types of recommen-dation requests for LOD repositories The evaluationshave shown that the SKOSRec system can often gener-ate relevant and useful suggestions when certain querytemplates are utilized In many cases the applicationof expressive or advanced query patterns (ie in thecontext of constraint-based retrieval or cross-domainrequests) helped to significantly improve recall valueswhile still providing the same level of quality in theremaining performance dimensions Additionally ad-vanced recommendation requests can be used to gener-ate diversified recommendation lists in case users arenot satisfied with the results of regular recommenda-tion requests (ie roll-up or cross-domain retrieval vsregular requests) While not being superior in each do-main and test case is at least safe to say that the novelrecommendation approaches of the SKOSRec engineconsiderably extend common retrieval methods and atleast provide an alternative search strategy when con-ventional methods fail to provide useful results Addi-tionally the increase in diversity and recall is in linewith findings from previous research on LOD-enabledRS [45]It is a strength of the developed approaches that theyfacilitate combinations of graph-based and similarity-based retrieval at different stages of the recommenda-tion workflow This feature extends general search ca-pabilities for semantic networks It requires future re-search to investigate whether in addition to SKOS fur-ther RDF vocabularies can be used for ad-hoc item-to-item similarity computation to increase the range ofpotential usage scenarios since not every LOD datasetcontains SKOS annotations [53] The application ofadditional properties could also boost the recommen-dation quality of some query patterns (eg such as thequery types used in TC3) In this context it also needsto be determined whether the representation of the user

profile in the SKOSRec query language can be substi-tuted with free-text expressions to enable search-likefunctionalities LOD researchers have already devel-oped engines that can process natural language queriesover RDF data [59] [34] [67] It will have to be inves-tigated how the SKOSRec engine profitably fits intothis landscape of existing retrieval toolsAnother open research question concerns the aspect ofinterfaces to other applications In its standard config-uration the system is not an end-user retrieval enginebut a backend application with which an administra-tor who has domain knowledge (eg of RDF vocab-ularies and data models of a particular LOD reposi-tory and usage scenario) creates the respective querypatterns Although the websites of the online experi-ments represent possible implementations of suitableend-user applications for the SKOSRec engine thereare still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selec-tion of LOD items for the profile was made possibleby previously storing the data in an index that couldthen be accessed and searched before issuing a recom-mendation query However this runs against the ideaof ad-hoc retrieval In this context future research willhave to determine whether it is feasible to generate anon-the-fly mapping from items to the respective LODresources In the case of a commercial application theengine needs to match items with the correspondingentries in the product catalog Even if it is not yet clearwhich UI components can make the best possible useof the SKOSRec enginersquos features the results from theuser experiments can give at least a few suggestionsThe following retrieval options should be the defaultsetting in an interface that contains a LOD-enabledRS component to assist users in finding interestingitems Since the regular approach of on-the-fly re-trieval achieved good results throughout the domainsthe display of simple recommendations generated frompreviously stated user preferences represents a suitablestarting point for retrieval On-the-fly queries could berefined by providing filter options Since the evalua-tions have shown that expressive constraints often leadto increased recall values the application of such afilter might be the best option for assisted retrievalAs in the preparations for the experiments appro-priate graph-based query patterns for the usage sce-nario in question would also have to be determinedby a domain expert before setting up the applicationThe same applies to advanced retrieval patterns whichshould be available to the end user to refine the querywhen both the simple as well as the constraint-based

28 L Wenige et al Similarity-based Graph Queries

approach have failed to provide relevant items Thissuggestion is made because roll-up queries were com-petitive with regular recommendations in the web-based experiments Therefore they represent a viablealternative retrieval strategy For some domains (egfor multimedia retrieval) it is also possible to usean optional selection field to enable the formulationof cross-domain requests Evaluations for this querytype suggest that users might receive interesting results(ie diversified and comprehensive recommendationlists) from such a requestIn summary the contributions of the paper are as fol-lows

ndash Development and implementation of novel re-trieval strategies in the SKOSRec system whichfacilitate personalized requests and leverage thepotential of knowledge graphs in order to addressthe research gap in state-of-the-art recommenda-tion methods with regard to customized queries

ndash Realization of web-based user experiments infour usage scenarios with the following out-comes

bull Successful demonstration that knowledge graphdata is applicable in the context of live recom-mendation scenarios

bull Creation of suitable query patterns that canbe applied to represent individual informationneeds in different application contexts The setof query patterns can be extended with the helpof the SKOSRec query language to meet therequirements of additional usage scenarios

bull Proof that the novel retrieval strategies can besuccessfully applied across various applicationcontexts and recommendation tasks and thatthey outperform conventional content-basedretrieval approaches especially in the perfor-mance dimensions of diversity and recall

The prototypical implementation and the evaluationof the SKOSRec engine in different usage scenarioshave demonstrated that the freely available knowledgesources on the LOD cloud can be successfully appliedto advance state-of-the-art methods in IR and recom-mender systems The proposed personalized and cus-tomized retrieval strategies can leverage the potentialof knowledge graphs by better matching informationneeds of users with the help of semantic search op-tions

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be com-prised of up to six elements of which the partsSimProjection and ItemPart are obligatory Henceusers need to state at least their preferencesand how many recommendations they wouldlike to receive Advanced retrieval patterns canbe formulated as well For instance similar re-source retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (Se-lectPart) of LOD resources As in SPARQL thequery can start with a prologue that abbreviatesIRIs with namespace declarations [21]

SelectPart The SelectPart part enables subqueryingwith recommendation results The surroundingquery is a SELECT query with a slightly simpli-fied WHERE clause (ie RecWhereClause) withwhich postfilter conditions can be formulatedIn this section users have the option to formu-late aggregation-based retrieval requests (Aggre-gation)

Aggregation This query part handles aggregation-based postfiltering Users specify the IRI (IRIref )resource for which similarity scores of sublevelentities are summarized Additionally it is statedhow the data should be aggregated (ie SUMMAX or AVG) as well as which variable (Var)represents the aggregation-based recommenda-tions in the postfilter section The compiler makessure that this variable is actually contained in theRecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refersto the list of generated suggestions It definesthe recommendation variable to which all simi-lar LOD resources are mapped In case pre- andpostfilter conditions are set in the other sectionsthe compiler checks whether the variable (Var)appears in these sections as well Otherwise therequired join operations cannot be executed Itis specified how many recommendations shouldbe displayed (Integer) and whether the requestwill be issued against a default repository or as across-repository request (ServiceIntegration)

ServiceIntegration This section indicates that a userintends to receive recommendations from a dif-ferent SPARQL endpoint than the default end-point that is stated in the standard configurationof the SKOSRec engine It specifies the targetSPARQL endpoint (IRIref ) from which a user

L Wenige et al Similarity-based Graph Queries 29

wants to receive recommendations Upon extrac-tion of SKOS annotations from the default sourceendpoint a subsequent request is sent to the spec-ified target endpoint

ItemPart This section of a SKOSRec query repre-sents a single preference in the user profile How-ever recommendation queries can contain a cou-ple of preference statements A preference is ei-ther expressed as a LOD resource (IRIref ) or asa variable which is bound to a preference query(VarPart) In the latter case the resources areobtained by matching the graph pattern in theVarPart with the triple statements in the reposi-tory The ItemPart statement can be additionallysupplemented with a concept-to-concept similar-ity threshold (Sim) that triggers concept expan-sion with the approach of flexible similarity de-tection which is extensively described in one ofour previous works [64]

VarPart The construct initiates preference queryingIndividual likings are expressed as a graph pat-tern in a RecWhereClause The variable (Var) isa placeholder for the LOD resources that will beused to set up the user profile The compiler hasto make sure that the specified variable occurs inthe RecWhereClause as well Otherwise the ex-traction of LOD resources cannot be carried outcorrectly

Sim The Sim part of a SKOSRec query denotes thethreshold of concept-to-concept scores for flexi-ble similarity detection Even though knowledge-based similarity values usually range between 0and 1 the specification allows a broader range ofdigits in the Decimal datatype to enable applica-tion of other similarity metrics in case they areneeded

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users canstate whether scores should be ldquolargerrdquo ldquolargerthanrdquo or ldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three dif-ferent sections of a SKOSRec query ie either inthe prefilter (ItemPart) the postfilter (SelectPart)or the preference filter section (VarPart) TheRecWhereClause closely resembles the ldquoWhere-Clauserdquo of the SPARQL 11 specification withall its subsequent parts [21] Hence it enablesdifferent combinations of graph pattern match-ing However the RecWhereClause of the SKOS-Rec language allows fewer pattern matching ex-pressions than the ldquoWhereClauserdquo of the regular

SPARQL syntax specification [21] For instanceit is neither permitted to formulate subqueries norto direct filter requests to more than one reposi-tory (see RecGraphPatternNotTriples for furtherexplanations) This modification has been madeto simplify similarity calculation After parsingthis section the compiler makes sure that recom-mendation preference or postfilter variables oc-cur at least once in the RecWhereClause to guar-antee that subsequent steps of the recommenda-tion workflow can be applied on existing LOD re-sources In case the RecWhereClause comprises aRecMinusGraphPattern it also has to be checkedthat the variable is not only contained in theMINUS part of the group

RecGroupGraphPattern A RecGroupGraphPatterncan contain one to many basic graph patternswhich can be differently combined according toRecGroupGraphPatternSub

RecGroupGraphPatternSub This section defines thegraph patterns that are applied to identify suit-able LOD resources User filters can be either ex-pressed as basic graph patterns (TriplesBlock) oras combinations of them (RecGraphPatternNot-Triples) The sequence of different kinds of pat-terns can be flexibly specified

RecGraphPatternNotTriples This query part closelyresembles the similar named ldquoGraphPatternNot-Triplesrdquo of the SPARQL 11 specification [21]The only difference to SPARQL is that the Rec-GraphPatternNotTriples section is a little morerestricted in terms of admissible graph patternsthan the ldquoGraphPatternNotTriplesrdquo section of theregular SPARQL specification For instance itdoes not allow to retrieve LOD resources otherthan from the default graph that is specified in theconfiguration of the SKOSRec engine since filterpatterns have to be applied to datasets that containSKOS annotations Hence they need to be speci-fied before runtime execution to ensure that sim-ilarity calculation can be carried out correctly Ifa user was able to formulate filter conditions thatreferred to different RDF graphs and endpointsaccordingly this would not be possible

RecGroupOrUnionGraphPattern This query sec-tion marks an alternative graph pattern in the styleof the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntaxspecification [21] However it neither allowsSPARQL-like subqueries nor federated queries

30 L Wenige et al Similarity-based Graph Queries

for filtered resources because this would compli-cate similarity calculation

RecOptionalGraphPattern In this part of a SKOS-RecQuery users are enabled to specify additionalgraph patterns which might extend the query so-lution but do not necessarily have to match thedata

RecMinusGraphPattern This section handles exclu-sion of LOD resources for cases when users wantto omit certain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

L Wenige et al Similarity-based Graph Queries 31

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM A

mobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige J Ruhland Flexible on-the-fly recommendationsfrom Linked Open Data repositories In International Confer-ence on Business Information Systems pp 43ndash54 2016

[63] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[64] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[65] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[66] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[67] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Future Work amp Conclusion
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 15: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge … · Semantic Web 0 (0) 1 1 IOS Press Similarity-based Knowledge Graph Queries for Recommendation Retrieval Lisa Wenige*,

L Wenige et al Similarity-based Graph Queries 15

Fig 3 Recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

familiar to the user (|nov|) among the total numberof relevant items (|tp|) (Eq 20)

nv =|nov||tp|

(20)

An item was considered relevant when it achievedan mrs score of 50 or higher In addition to novelty anddiversity the web application also tracked the diversityof recommendation lists since a topically diversifiedresult set has been found to increase overall user satis-faction [31 66] In the RS literature the most widelyexplored diversification metrics are the ones that mea-

sure the similarity between each item pair in the rec-ommendation list The more the items are alike theless diverse is the result set [19] The study series ofthis paper applied the metric by Castells et al [8] Itdetermines item-to-item similarity values with the co-sine similarity measure Scores are then converted todistance values (Eq 21) and averaged for the recom-mendation list (Eq 22)

d(il im) = 1minus cos(il im) (21)

16 L Wenige et al Similarity-based Graph Queries

divC =2

k(k minus 1)

sumlltm

d(il im) (22)

The content-based diversity scores (divC) were cal-culated in the backend of the web application andsaved in a log file for each generated recommenda-tion list of the SKOSRec engine during runtime execu-tion In addition to the measurement of divC scores weasked participants about their opinions on recommen-dation list diversity directly Following the RS evalua-tion framework by Pu et al [44] two Likert items wereposed for this purpose

ndash The recommendations of list [] are diversendash The recommendations of list [] are similar to

each other

Besides the above listed algorithmic performancemetrics the authors gathered assessments from usersregarding the overall usefulness of the suggestionsThis approach is also in line with the empirically veri-fied evaluation framework by Pu et al [44] which pro-poses to measure the psychometric construct of Per-ceived Usefulness to get a comprehensive understand-ing of user opinions on recommendation results Thusparticipants were instructed to state their agreement topositive statements regarding the relevance of the sug-gestions The statements were adapated to the speci-ficities of each usage scenario Fig 4 shows an exam-ple collection of Likert questions for the domain ofmusic RS

After participants had provided the required infor-mation their answers were saved in a log file and theywere navigated to test case 2 (TC2) which evaluatedthe SKOSRec enginersquos performance for constraint-based queries (similar to Q3 Sect 3) We tried to findout if the engine was able to improve user satisfactionthrough filter conditions Even though IR systems al-ready successfully apply facet filters on result sets theexperiments still had to prove whether LOD-enabledconstraints have the same effect on the Perceived Use-fulness of recommendations For the comparisons ofnon-filtered and filtered suggestions we applied awithin-subjects design in which participants had torate two recommendation lists resulting from regular(non-filtered) and filtered requests in TC2 We favoredthis setup over a between-subjects design where sub-jects would have been assigned to only one result listWhile the between-subjects design more closely re-sembles a real-world setting since it does not require

participants to interact with more than one approachat a time differences among subjects (eg regard-ing demographics or topic expertise) can potentiallyhave a considerable impact on quality judgments [19]Hence when applying a between-subjects design onedoes not know whether differences in performance arecaused by the approaches or by the variability amongparticipants Therefore researchers need to conductbetween-subjects experiments with a sufficient numberof study subjects to level out these differences Within-subjects studies on the other hand require fewer par-ticipants while still achieving the same level of sta-tistical power since between-subject variability is notan issue [30 31] Beel et al have shown that evensmall variations in an RS test setting could cause sub-stantial differences in performance outcomes [6] Tokeep the extent of variations even lower comparisonsbetween non-filter and constraint-based recommenda-tions were based on the same profile for each user inTC2 Therefore the user profile from TC1 served as astarting point for filtered retrieval The web applicationshowed the recently generated profiles to users andasked them to provide an additional filter conditionFig 5 depicts an example web form for a constraint-based recommendation request in the music domain Itcomprises the user profile and an additional text fieldwhere users stated their constraint

Afterwards the engine applied this filter conditionon the set of potential recommendation results be-fore similarity calculation started For each usage sce-nario different types of constraints were available Re-garding suitable filter options the author sought toachieve a balance between assisting users in finding fil-ter conditions that would adequately represent their in-formation needs while keeping the variability amongusers as low as possible Therefore the specificitiesof each scenario and the availability of the respectivedata sources were determined For instance for themultimedia domains (movie music and books) it wasassumed that users would like to filter recommenda-tions according to the specific genre of the item Fortu-nately the DBpedia dataset contains this kind of infor-mation for the music domain In this domain the genrecan be retrieved through the properties dbogenreHowever in the movie and book domains DBpediadoes not provide sufficient information Many LOD re-sources which represent books were not assigned agenre In the movie domain a genre property is miss-ing entirely and genre-specific information can of-ten only be found in the subject categories describ-ing the movie Hence in contrast to the experiment on

L Wenige et al Similarity-based Graph Queries 17

Fig 4 Evaluation screen for recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

music recommendations participants in the domainsof books and films did not filter their recommenda-tion lists by genre On the other hand the propertydcsubject was applied as recommendation filter ineach multimedia domain Since the subject propertyoften links to many informative features of multimediaitems (eg release period geographic information orcontent information) the author decided that this fea-ture represents a powerful filter dimensionIn the experiment on travel destination search the webinterface offered two filters a subject and a location-specific filter The subject filter enabled users to statetheir preferences regarding the characteristics of a lo-cation (eg being a nature reserve) The location-specific filter on the other hand gave participants theoption to specify where the desired destination shouldbe located11

The second fundamental research issue of TC2 con-cerned the question whether expressive constraints canboost recommendation quality even further Hence

11httpdbpediaorg

apart from executing a constraint-based workflow witha simple filter (ie a direct attribute of an item)the SKOSRec engine generated suggestions with thehelp of an expressive filter as well This second pro-cedure while still applying the same user constraintas the simple filter utilized an expressive graph pat-tern to include additional LOD resources in the resultset For instance in case a subject filter was selectedresult set expansion was facilitated by exploring thewider semantic space of the DBpedia category graphthrough a property path declaration in the graph pat-tern (eg skosbroader[2]) In the travel exper-iment an additional expressive filter retrieved place-related information (dulhasLocation) for location-specific subproperty relations that are not material-ized in DBpedia Hence the expressive graph pat-tern expanded the filter such that additional proper-ties were matched (eg dbocountry dbocity ordbostate) while still applying the specified userconstraint on the set of LOD resources Upon exe-cution of constraint-based recommendation requestsusers received two recommendation lists (resultingfrom simple and expressive filtering accordingly) in a

18 L Wenige et al Similarity-based Graph Queries

Fig 5 Web form for a contraint-based recommendation request (music domain)

separate evaluation screen Participants evaluated re-sult sets in the same manner as in TC1 For each resultset the screen contained an additional Likert item ask-ing subjects to give their agreement to the statementldquoThe filter has improved the recommendation resultsof listrdquo The sequence of suggestions resulting fromthe different methods was randomly assigned in eachuser session to avoid order effectsAfter study subjects had completed TC2 they werenavigated to test case 3 (TC3) In the TC3 section ofthe travel experiment participants could choose be-tween three options for rollup query patterns (simi-lar to Q6 Sect 3) Users were able to obtain recom-mendations for travel destinations based on POIs theyhad visited during the stay in another destination Theycould either select a city (option 1) a region (option 2)or a country (option 3) as destination type Upon en-tity type selection the SKOSRec engine generated ap-propriate suggestions Fig 6 shows the web form fromthe travel experiment that facilitated the formulation ofadvanced rollup requests in TC3

When users had selected an entity type stated theirfavorite travel destination and three belonging POIs

the application generated a SKOSRec query from theseparameters A simple on-the-fly request was sent tothe engine as well to retrieve baseline recommenda-tions with which the suggestions resulting from theadvanced request were compared at a later stage ofthe experiment The simple query only contained theuserrsquos favorite travel destination and the selected en-tity type option thus omitting information on the POIsDuring the travel experiment the engine processedboth the simple and the advanced query and producedtwo recommendation lists As in the previous parts ofthe experiment participants assessed the quality of theresult sets which appeared in random order on thescreenThe multimedia experiments also contained a sec-tion on rollup retrieval (TC3) As the advanced travelqueries the multimedia requests aggregated similarityscores of sublevel entities through summation whichthe engine joined with a postfilter In addition to thesefeatures the pattern contained a preference query partThis section identified a set of preferred items througha single user statement Participants either specifiedtheir favorite directoractor (music domain) music act

L Wenige et al Similarity-based Graph Queries 19

Fig 6 Travel - User profile generation TC3

(music domain) or author (book domain) and receivedrecommendations based on the features of the workscreated by the artist (similar to Q5 Sect 3) TheSKOSRec engine generated recommendations basedon the creative works of the favored artist Partici-pants also received suggestions from a simple on-the-fly query which computed results according to theartistrsquos characteristics As in the travel experiment rec-ommendations from the advanced rollup request werecompared with the on-the-fly query without letting par-ticipants know which approach they were assessingand through random assignment of result set positionsAfter participants of the multimedia experiments hadevaluated the results of TC3 they were guided to testcase 4 (TC4) which evaluated cross-domain queriesThe experiment on travel RS ended with TC3 becauseno suitable graph patterns could be identified to fa-cilitate these kinds of suggestions for the travel usagescenario In the multimedia domain however the datafrom DBpedia was sufficient for this retrieval task Theentity type of the study (ie movie music act or book)defined the source domain based on which the enginegenerated suggestions for items from another multime-dia target domain (similar to Q7 Sect 3) Fig 7 de-picts the cross-domain web form of the music experi-ment

The web form assisted subjects in formulating a re-quest which then obtained movie suggestions basedon the favorite music act of the user Parameters fromusers were entered in the placeholders of the querypattern in the backend of the application We formu-lated the query templates in such a way that the queryengine matched any properties or graph patterns thatcould potentially connect two items from the specified

target and source domains For instance such match-ings can occur when a user states his favorite moviethat is written by a particular author In case the sameauthor has also written some books they might be ofinterest to the user as well Another example stemsfrom the movie domain Suppose in hisher profile aconsumer has declared a preference for a movie thatlinks to the corresponding soundtrack It might wellbe the case that the user likes the soundtrack and themusic acts that were involved in creating it just asmuch as she likes the movie The practice to distin-guish homonymous LOD resources through the prop-erty dbowikiPageDisambiguates is another in-teresting linking pattern from DBpedia that was ex-ploited for cross-domain requests in TC4 since it con-nects similar entities (ie the book edition of a certainmovie)Cross-domain SPARQL queries (similar to Q8 Sect3) were posed as baseline requests in this test caseWhile the SPARQL query only matched cross-domainrelations for a single LOD resource the SKOSRec re-quest expanded the search space by considering linksfrom more than one item As in the previous test casesTC4 ended with an evaluation screen where users as-sessed the SPARQL-based as well as the SKOSRecsuggestions in random order without knowing whichrecommendation list belonged to which method Table10 gives an overview of the test cases and the meth-ods that were applied in the experiments in each usagescenario

42 Results

In total 257 people took part in the studyThe gath-ered samples from the experiments contained answers

20 L Wenige et al Similarity-based Graph Queries

Fig 7 Music - User profile generation TC4 (page 9)

Table 10Test Cases in the web experiments

Test Case Evaluation Purpose Query Type Travel Movie Music BookTC1 On-the-fly Recommendations (Baseline) Q1 X X X X

TC2 Constraint-based Recommendations Q3 X X X X

TC3 Advanced Queries (Rollup) Q5 Q6 X X X X

TC4 Advanced Queries (Cross-Domain) Q7 x X X X

from both genders with a slight overrepresentation ofmale participants (537) The prevailing number ofstudy subjects was between 20 and 40 years of age(658) The evaluation of the demographics sectionalso revealed that the majority of participants alreadyperceives the search in their domain of interest as ei-ther ldquomanageablerdquo (385) or even as ldquoeasyrdquo or ldquoveryeasyrdquo (440)Prior to analyzing the performance results of the novelretrieval approaches of the SKOSRec engine it waschecked whether the respective Likert items for Per-ceived Usefulness and user-based diversity (divU)reached acceptable reliability levels For this purposeCronbachrsquos α values were determined Table 11 showsthe outcome of this analysis for the Perceived Useful-ness construct Given the acceptable values through-out the domains (Cronbachsrsquo s α gt 07) it could betreated as an interval-scaled variable in the statisticaltests

Table 11Cronbachrsquos α values for the concept of Perceived Usefulness

Domain Cronbach αTravel 07002

Movie 08579

Music 08515

Book 08949

Besides usefulness user-based diversity scores werealso measured by multiple Likert items However

Cronbachrsquos α values for this concept did not reach ac-ceptable levels In order to avoid that valuable user as-sessments remained unused agreements to the state-ment bdquoThe recommendations of list [] are diverse ldquowere taken into account as an ordinal variable (divU)Based on these results subsequent tests could be car-ried out The first part of the analysis concerned theperformance of the on-the-fly retrieval method (base-line) of the SKOSRec engine in TC1 The outcomeof this analysis confirms that the on-the-fly approachrepresents a viable recommendation strategy Table 12shows the mean (M)12 standard deviation (S D) andnumber of participants (N) for each performance di-mension and metric It can be seen that users assessedthe enginersquos performance in almost each quality di-mension to be above mediocrity Aside from the posi-tive user evaluations with regard to Perceived Uselful-ness related quality aspects such as Accuracy Noveltyand Diversity received good assessments as well Forinstance each domain mean relevance score (mrs) washigher than 50 score points It means that participantswere often convinced of the relevance of the resultsGiven these positive user assessments with regard toitem relevance it seems to be justified to consider themean of result list sizes as an indicator of recall Theengine was able to generate the required number ofrecommendations in almost each test case (see Table

12In case of the user-based diversity score (divU) M denotes themedian because of the ordinal scale of the variable

L Wenige et al Similarity-based Graph Queries 21

12) The on-the-fly retrieval approach was also testedwith regard to the Novelty dimension Table 12 showsthe mean novelty scores (nv) for the baseline recom-mendation approach Novelty scores indicate that themajority of items in a result list were deemed to be newin the travel and book RS whereas in the experimentson movie and music RS only one third of the recom-mendation list was unfamiliar to users In contrast tothat content-based Diversity scores (divC) were ratherhigh in each domain Hence it can be concluded thatitems in a result set usually have a low rate of match-ing SKOS annotations In addition to content-basedscores user assessments of recommendation list Di-versity (divU) were also taken into account The me-dian level of agreement to the Diversity-related Lik-ert item stating that recommendations are diverse liedat 3 (ldquoneutralrdquo) in the multimedia domains and at 4(ldquoagreerdquo) in the travel domainIn summary the baseline approach achieved fairlygood performance results in TC1 The findings demon-strate that LOD-enabled recommendations can po-tentially enhance a userrsquos search experience Theyalso allow for subsequent tests of advanced retrievaltechniques which can be compared to this standardmethod

In the analysis of performance results from TC2(constraint-based recommendations) particular atten-tion was paid to participantsrsquo agreements with thestatement ldquoThe filter has improved the recommenda-tion results of the listrdquo Fig 8 shows the distribution ofdomain-wise agreement ratios On average they werefairly high Throughout the domains the vast majorityof participants either selected 5 (ldquostrongly agreerdquo) 4(ldquoagreerdquo) or 3 (ldquoneutralrdquo) Hence a filter often had apositive impact on results

The authors gathered responses to the questionon improvement for recommendation lists resultingfrom both the simple and the expressive filtering ap-proaches Hence they give clues about the generalperformance of constraint-based retrieval Howeversince this section of the study revealed the experimen-tal condition to participants (ie application of a fil-ter) answers to the Likert statement have to be inter-preted cautiously Results may be slightly biased asparticipants were able to guess the underlying agendaHowever these limitations do not exist for compar-isons between simple and expressive filters Since theweb application randomly assigned the list order par-ticipants did not know which approach they were as-sessing This enables us to take a closer look at the dif-ferences between the two filtering approaches It was

Table 12Performance results in TC1 (M = Mean S D = Standard DeviationN = No of participants)

Dimension (Metric)[Max Score]

Domain Results

M SD N

Accuracy (size) [10]

Travel 1000 000 103Movie 1000 000 50Music 1000 000 53Book 980 140 51

Accuracy (mrs) [1]

Travel 5649 000 103Movie 6225 1915 50Music 5570 010 53Book 5747 1555 50

Novelty (nv) [1]

Travel 060 034 101Movie 036 031 50Music 036 039 52Book 061 029 47

Diversity (divU) [5]

Travel 4 - 102Movie 3 - 50Music 3 - 53Book 3 - 50

Diversity (divC) [1]

Travel 080 016 103Movie 087 012 50Music 086 010 50Book 092 008 50

Usefulness [5]

Travel 334 085 103Movie 355 073 50Music 318 085 53Book 343 081 50

hypothesized that advanced filters improve recommen-dation quality as they may identify more relevant re-sources thereby overcoming problems of data qualityor data sparsity in LOD repositories Table 13 lists re-sponse rates of the two methods The rate measuresthe ratio of non-zero result sets among all result setsThroughout the domains recommendation lists result-ing from expressive filtering achieved higher responserates

Table 13Response Rates (TC2)

Domain Constr-based(Regular)

Constr-based(Expanded)

N

Travel 23 57 103

Movie 86 88 50

Music 72 75 53

Book 73 76 51

22 L Wenige et al Similarity-based Graph Queries

000

025

050

075

100

Digital Library Travel Movie Music Book

Domain

Per

cent

age

of A

gree

men

t

Degree of Agreement

1

2

3

4

5

The filter has improved the recommendations of the list

Fig 8 Participantsrsquo agreement with the statement that the filter has improved the result list (domain-wise) (TC2)

In addition to response rates user assessments andscores in the defined quality dimensions were com-pared Table 14 depicts the outcome of this analysisSignificantly higher scores are marked in bold figuresIn case statistical tests identified no meaningful varia-tions the results of the better performing approach areunderlined A-levels were adjusted with the false dis-covery rate (FDR) in order to decrease the probabilityof making a type I error due to multiple testing

According to mean scores expressive requestsgenerated more comprehensive recommendation lists(size) This finding is in line with the increased re-sponse rates shown in Table 13 However subse-quently conducted t-tests confirmed significant dif-ferences only for the travel experiment (t(102) =minus789 p lt 0001) In contrast precision scores werehigher in most of the domains (travel movie mu-sic) when the SKOSRec engine processed a simpleconstraint-based query However statistical tests didnot confirm any systematic differences The authorsapplied Wilcoxon tests for mrs scores and the remain-ing performance metrics because of the small samplesizes resulting from the low response rates of the sim-ple constraint-based approachRegarding Novelty none of the approaches was supe-rior In the Diversity dimension expressive filters gen-erated suggestions that were at least as topically di-versified as the results from simple constraint-basedretrieval Even though no significant differences were

identified between the two approaches increased meanscores (divU in the travel domain and divC in each do-main) indicate a slight superiority of expanded filterrequests in this quality dimension The same appliesto usefulness scores (mean values were higher in 3 outof 4 domains when expressive filtering was applied)However these statements are speculative at best be-cause statistical tests were not significantIn summary it can be concluded that expressive fil-ters improve recall potentially leading to a slight lossin precision scores (see Table 14) It may also be thecase that expanded user constraints diversify as wellas increase the usefulness of recommendation listsHowever these claims are unverified In contrast itis safe to say that expressive query patterns generateresults of at least the same quality as simple patternswhile increasing the number of relevant suggestionsThe few differences may be explained by the high Jac-card scores (JI) (see Table 15) of recommendation listsThroughout the domains result sets were much alikeGiven these similarities and the increases in list sizesfor expressive filters it is supposed that an expandedconstraint produces an enhanced version of the recom-mendation list resulting from simple filtering There-fore expanded filters should be the default setting in aconstraint-based retrieval context

In TC3 (ie evaluation of roll-up query patterns)the SKOSRec engine almost always generated non-empty recommendation lists (see Table 16) In the

L Wenige et al Similarity-based Graph Queries 23

Table 14Performance results in TC2 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Simple Expressive N

M SD M SD

Accuracy (size) [10]

Travel 117 270 452 462 103Movie 672 408 720 392 50Music 596 464 636 451 53Book 459 450 500 451 51

Accuracy (mrs) [1]

Travel 6673 2051 6534 2236 23Movie 6095 2298 6029 2279 43Music 6377 1605 6296 1644 34Book 5617 1818 5681 1681 37

Novelty (nv) [1]

Travel 048 041 061 049 21Movie 044 036 055 080 42Music 055 039 052 037 38Book 077 034 076 033 33

Diversity (divU) [5]

Travel 3 - 35 - 24Movie 4 - 4 - 43Music 4 - 4 - 38Book 3 - 3 - 37

Diversity (divC) [1]

Travel 068 018 072 021 19Movie 070 024 073 023 41Music 086 013 087 012 34Book 084 024 095 063 29

Usefulness [5]

Travel 333 093 350 077 24Movie 332 085 331 086 43Music 355 081 364 078 38Book 339 083 362 121 37

Table 15Jaccard Indices (TC2)

Domain Mean Jaccard Index (JI) SD N

Travel 057 047 25

Movie 071 038 43

Music 082 035 37

Book 092 026 35

travel and the book experiment the response rate washigher for aggregation-based queries whereas in themovie domain the engine more often provided recom-mendations for regular requests However these dif-ferences are only marginal and do not indicate a clearsuperiority of one method

Fig 9 depicts participants general agreement to Lik-ert statements concerning the Perceived Usefulness ofthe approaches in TC3 The diagram shows similarresults for regular as well as advanced requests Onaverage satisfaction scores reached levels which layslightly above neutral agreement A subsequent t-test

Table 16Response Rates (TC3)

Domain Regular Aggr-based N

Travel 99 100 103

Movie 96 92 50

Music 100 100 53

Book 96 100 51

confirmed no significant differences between the twomethods In fact the approach with the highest aver-age score of Perceived Usefulness was different in eachexperiment While mean values were higher for regu-lar requests in the studies on travel and music RS ad-vanced queries produced higher scores in the movieand book experiments

Content-based Diversity (divC) was the only met-ric with systematic differences Here scores went upfor advanced retrieval patterns in the travel (t(98) =minus550 p lt 0001) the music (t(52) = minus451 p lt

0001) and the book domain (t(47) = minus490 p lt

24 L Wenige et al Similarity-based Graph Queries

0

1

2

3

Travel Movie Music Book

Domain

Agr

eem

ent

Method

Regular

Rollminusup

Perceived Usefulness (Regular vs Rollminusup Queries)

Fig 9 Participantrsquos overall satisfaction with regular and roll-up recommendation retrieval (TC3)

0001) (see Table 17) This finding is not surprisinggiven that the aggregation-based approach explores thewider semantic network of each item in the profile

Hence it can be assumed that both approaches pro-duce recommendation lists of similar quality and canprovide benefits to potential consumers Thus it can-not be clearly stated which approach is superior be-cause neither of them outperformed the other in eachdomain While the regular retrieval strategy was bet-ter in terms of usefulness in the travel and the mu-sic domains the aggregation-based approach achievedhigher scores in the domains of movie and book rec-ommendationsThis conclusion is especially impressive given thelow mean Jaccard scores for recommendation lists re-sulting from regular and aggregation-based retrieval(see Table 18) The mean score never exceeded 01

throughout the domains It indicates a low concor-dance between result sets It is remarkable that par-ticipants perceived the recommendations as equallygood given the high dissimilarity of the lists Thusadvanced queries have an added-value as they provideadditional interesting results They can help to exploreLOD repositories in different ways than a regular re-trieval approach Therefore it is worthwhile to offeraggregation-based query patterns as an alternative re-trieval strategy when the regular approach has not pro-duced any helpful recommendations

In addition to TC3 user assessments for TC4 (cross-domain recommendations) were also evaluated How-ever since the authors conducted experiments for TC4only in the multimedia domains the samples were con-siderably smaller Nevertheless it is assumed that theamount of completed user sessions is only just enoughto conduct further statistical analyses Table 19 showsthe response rates for the two retrieval methods Cross-domain patterns had a higher success rate of generat-ing non-zero result sets throughout the domains Par-ticipants almost always received a recommendation forthis query type whereas in case a regular SPARQLquery was executed often they did not receive a singlesuggestion at all

We applied non-parametric tests for the perfor-mance metrics mrs nv divU divC and PerceivedUsefulness due to the low response rates of theSPARQL-based approach which led to a low num-ber of comparable data points accordingly Overall theFDR-adjusted pairwise Wilcoxon tests detected no sig-nificant differences except for user-based Diversityscores (divU) in the movie (Z = minus269 p lt 005) andthe book domain (Z = minus256 p lt 005) The resultsfor the content-based Diversity (divC) metric seem topoint in the same direction because of the high meanvalues for cross-domain queries in each domain Butthe significance of these results was not verified Themrs scores indicate a better performance of regular

L Wenige et al Similarity-based Graph Queries 25

Table 17Performance results in TC3 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular Aggr-based N

M SD M SD

Accuracy (size) [10]

Travel 990 099 942 207 103Movie 288 059 276 082 50Music 1000 000 1000 000 53Book 288 059 300 000 51

Accuracy (mrs) [1]

Travel 4821 2498 4493 2559 102Movie 5458 2288 5669 2542 44Music 5395 1831 5824 1619 52Book 5153 2390 4925 2230 49

Novelty (nv) [1]

Travel 043 039 038 040 98Movie 035 038 028 041 41Music 049 036 046 039 50Book 061 042 062 044 47

Diversity (divU) [5]

Travel 35 - 3 - 100Movie 4 - 3 - 43Music 4 - 3 - 50Book 3 - 3 - 48

Diversity (divC) [1]

Travel 079 017 089 013 99Movie 075 020 066 027 44Music 083 017 093 007 53Book 074 027 093 006 48

Usefulness [5]

Travel 336 077 322 074 101Movie 320 092 326 094 44Music 345 084 344 085 51Book 295 087 303 096 48

Table 18Jaccard Indices (TC3)

Domain Mean Jaccard Index (JI) SD N

Travel 007 011 101

Movie 003 007 44

Music 002 005 53

Book 002 006 50

Table 19Response rates (TC4)

Domain Regular (SPARQL) SKOSRec N

Movie 32 96 50

Music 62 98 53

Book 53 100 51

SPARQL queries but the within-subjects test did notconfirm this assumption statistically (see Table 20)

However the most important finding of the anal-ysis was that SKOSRec cross-domain queries gener-ated significantly more suggestions throughout the do-

mains (movie t(49) = minus1792 p lt 0001 mu-sic t(52) = minus2573 p lt 0001 book t(50) =minus1275 p lt 0001) (see Table 20) Fig 10 illustratesthis graphically

Whenever the SPARQL request provided recom-mendations which was the case in less than half ofthe user sessions (see Table 19) the SKOSRec sug-gestions were assessed to be of almost equal qualityas the SPARQL recommendations (see Table 20 Use-fulness) Hence when SPARQL querying did not pro-duce any results in more than the other half of the testcases the SKOSRec approach may still have provideduseful suggestions

Therefore it is reasonable to assume that the capa-bility of the SKOSRec engine to flexibly switch be-tween processing steps of similarity calculation andpattern matching can facilitate an improved explo-ration of LOD repositories Table 21 also shows thatthe items in the two sets often did not match as isindicated by low Jaccard indices This finding furtherdemonstrates that the execution of an additional step

26 L Wenige et al Similarity-based Graph Queries

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

00

25

50

75

100

Movie Music Book

Domain

No

of i

tem

s

Method

Regular

CrossminusDomain

Recommendation List Size

Fig 10 Mean recommendation list sizes for SPARQL and SKOSRec cross-domain requests (TC4)

L Wenige et al Similarity-based Graph Queries 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

of similarity calculation can help to retrieve other rel-evant items

5 Future Work amp Conclusion

This paper has presented the query-based SKOS-Rec engine that facilitates new types of recommen-dation requests for LOD repositories The evaluationshave shown that the SKOSRec system can often gener-ate relevant and useful suggestions when certain querytemplates are utilized In many cases the applicationof expressive or advanced query patterns (ie in thecontext of constraint-based retrieval or cross-domainrequests) helped to significantly improve recall valueswhile still providing the same level of quality in theremaining performance dimensions Additionally ad-vanced recommendation requests can be used to gener-ate diversified recommendation lists in case users arenot satisfied with the results of regular recommenda-tion requests (ie roll-up or cross-domain retrieval vsregular requests) While not being superior in each do-main and test case is at least safe to say that the novelrecommendation approaches of the SKOSRec engineconsiderably extend common retrieval methods and atleast provide an alternative search strategy when con-ventional methods fail to provide useful results Addi-tionally the increase in diversity and recall is in linewith findings from previous research on LOD-enabledRS [45]It is a strength of the developed approaches that theyfacilitate combinations of graph-based and similarity-based retrieval at different stages of the recommenda-tion workflow This feature extends general search ca-pabilities for semantic networks It requires future re-search to investigate whether in addition to SKOS fur-ther RDF vocabularies can be used for ad-hoc item-to-item similarity computation to increase the range ofpotential usage scenarios since not every LOD datasetcontains SKOS annotations [53] The application ofadditional properties could also boost the recommen-dation quality of some query patterns (eg such as thequery types used in TC3) In this context it also needsto be determined whether the representation of the user

profile in the SKOSRec query language can be substi-tuted with free-text expressions to enable search-likefunctionalities LOD researchers have already devel-oped engines that can process natural language queriesover RDF data [59] [34] [67] It will have to be inves-tigated how the SKOSRec engine profitably fits intothis landscape of existing retrieval toolsAnother open research question concerns the aspect ofinterfaces to other applications In its standard config-uration the system is not an end-user retrieval enginebut a backend application with which an administra-tor who has domain knowledge (eg of RDF vocab-ularies and data models of a particular LOD reposi-tory and usage scenario) creates the respective querypatterns Although the websites of the online experi-ments represent possible implementations of suitableend-user applications for the SKOSRec engine thereare still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selec-tion of LOD items for the profile was made possibleby previously storing the data in an index that couldthen be accessed and searched before issuing a recom-mendation query However this runs against the ideaof ad-hoc retrieval In this context future research willhave to determine whether it is feasible to generate anon-the-fly mapping from items to the respective LODresources In the case of a commercial application theengine needs to match items with the correspondingentries in the product catalog Even if it is not yet clearwhich UI components can make the best possible useof the SKOSRec enginersquos features the results from theuser experiments can give at least a few suggestionsThe following retrieval options should be the defaultsetting in an interface that contains a LOD-enabledRS component to assist users in finding interestingitems Since the regular approach of on-the-fly re-trieval achieved good results throughout the domainsthe display of simple recommendations generated frompreviously stated user preferences represents a suitablestarting point for retrieval On-the-fly queries could berefined by providing filter options Since the evalua-tions have shown that expressive constraints often leadto increased recall values the application of such afilter might be the best option for assisted retrievalAs in the preparations for the experiments appro-priate graph-based query patterns for the usage sce-nario in question would also have to be determinedby a domain expert before setting up the applicationThe same applies to advanced retrieval patterns whichshould be available to the end user to refine the querywhen both the simple as well as the constraint-based

28 L Wenige et al Similarity-based Graph Queries

approach have failed to provide relevant items Thissuggestion is made because roll-up queries were com-petitive with regular recommendations in the web-based experiments Therefore they represent a viablealternative retrieval strategy For some domains (egfor multimedia retrieval) it is also possible to usean optional selection field to enable the formulationof cross-domain requests Evaluations for this querytype suggest that users might receive interesting results(ie diversified and comprehensive recommendationlists) from such a requestIn summary the contributions of the paper are as fol-lows

ndash Development and implementation of novel re-trieval strategies in the SKOSRec system whichfacilitate personalized requests and leverage thepotential of knowledge graphs in order to addressthe research gap in state-of-the-art recommenda-tion methods with regard to customized queries

ndash Realization of web-based user experiments infour usage scenarios with the following out-comes

bull Successful demonstration that knowledge graphdata is applicable in the context of live recom-mendation scenarios

bull Creation of suitable query patterns that canbe applied to represent individual informationneeds in different application contexts The setof query patterns can be extended with the helpof the SKOSRec query language to meet therequirements of additional usage scenarios

bull Proof that the novel retrieval strategies can besuccessfully applied across various applicationcontexts and recommendation tasks and thatthey outperform conventional content-basedretrieval approaches especially in the perfor-mance dimensions of diversity and recall

The prototypical implementation and the evaluationof the SKOSRec engine in different usage scenarioshave demonstrated that the freely available knowledgesources on the LOD cloud can be successfully appliedto advance state-of-the-art methods in IR and recom-mender systems The proposed personalized and cus-tomized retrieval strategies can leverage the potentialof knowledge graphs by better matching informationneeds of users with the help of semantic search op-tions

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be com-prised of up to six elements of which the partsSimProjection and ItemPart are obligatory Henceusers need to state at least their preferencesand how many recommendations they wouldlike to receive Advanced retrieval patterns canbe formulated as well For instance similar re-source retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (Se-lectPart) of LOD resources As in SPARQL thequery can start with a prologue that abbreviatesIRIs with namespace declarations [21]

SelectPart The SelectPart part enables subqueryingwith recommendation results The surroundingquery is a SELECT query with a slightly simpli-fied WHERE clause (ie RecWhereClause) withwhich postfilter conditions can be formulatedIn this section users have the option to formu-late aggregation-based retrieval requests (Aggre-gation)

Aggregation This query part handles aggregation-based postfiltering Users specify the IRI (IRIref )resource for which similarity scores of sublevelentities are summarized Additionally it is statedhow the data should be aggregated (ie SUMMAX or AVG) as well as which variable (Var)represents the aggregation-based recommenda-tions in the postfilter section The compiler makessure that this variable is actually contained in theRecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refersto the list of generated suggestions It definesthe recommendation variable to which all simi-lar LOD resources are mapped In case pre- andpostfilter conditions are set in the other sectionsthe compiler checks whether the variable (Var)appears in these sections as well Otherwise therequired join operations cannot be executed Itis specified how many recommendations shouldbe displayed (Integer) and whether the requestwill be issued against a default repository or as across-repository request (ServiceIntegration)

ServiceIntegration This section indicates that a userintends to receive recommendations from a dif-ferent SPARQL endpoint than the default end-point that is stated in the standard configurationof the SKOSRec engine It specifies the targetSPARQL endpoint (IRIref ) from which a user

L Wenige et al Similarity-based Graph Queries 29

wants to receive recommendations Upon extrac-tion of SKOS annotations from the default sourceendpoint a subsequent request is sent to the spec-ified target endpoint

ItemPart This section of a SKOSRec query repre-sents a single preference in the user profile How-ever recommendation queries can contain a cou-ple of preference statements A preference is ei-ther expressed as a LOD resource (IRIref ) or asa variable which is bound to a preference query(VarPart) In the latter case the resources areobtained by matching the graph pattern in theVarPart with the triple statements in the reposi-tory The ItemPart statement can be additionallysupplemented with a concept-to-concept similar-ity threshold (Sim) that triggers concept expan-sion with the approach of flexible similarity de-tection which is extensively described in one ofour previous works [64]

VarPart The construct initiates preference queryingIndividual likings are expressed as a graph pat-tern in a RecWhereClause The variable (Var) isa placeholder for the LOD resources that will beused to set up the user profile The compiler hasto make sure that the specified variable occurs inthe RecWhereClause as well Otherwise the ex-traction of LOD resources cannot be carried outcorrectly

Sim The Sim part of a SKOSRec query denotes thethreshold of concept-to-concept scores for flexi-ble similarity detection Even though knowledge-based similarity values usually range between 0and 1 the specification allows a broader range ofdigits in the Decimal datatype to enable applica-tion of other similarity metrics in case they areneeded

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users canstate whether scores should be ldquolargerrdquo ldquolargerthanrdquo or ldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three dif-ferent sections of a SKOSRec query ie either inthe prefilter (ItemPart) the postfilter (SelectPart)or the preference filter section (VarPart) TheRecWhereClause closely resembles the ldquoWhere-Clauserdquo of the SPARQL 11 specification withall its subsequent parts [21] Hence it enablesdifferent combinations of graph pattern match-ing However the RecWhereClause of the SKOS-Rec language allows fewer pattern matching ex-pressions than the ldquoWhereClauserdquo of the regular

SPARQL syntax specification [21] For instanceit is neither permitted to formulate subqueries norto direct filter requests to more than one reposi-tory (see RecGraphPatternNotTriples for furtherexplanations) This modification has been madeto simplify similarity calculation After parsingthis section the compiler makes sure that recom-mendation preference or postfilter variables oc-cur at least once in the RecWhereClause to guar-antee that subsequent steps of the recommenda-tion workflow can be applied on existing LOD re-sources In case the RecWhereClause comprises aRecMinusGraphPattern it also has to be checkedthat the variable is not only contained in theMINUS part of the group

RecGroupGraphPattern A RecGroupGraphPatterncan contain one to many basic graph patternswhich can be differently combined according toRecGroupGraphPatternSub

RecGroupGraphPatternSub This section defines thegraph patterns that are applied to identify suit-able LOD resources User filters can be either ex-pressed as basic graph patterns (TriplesBlock) oras combinations of them (RecGraphPatternNot-Triples) The sequence of different kinds of pat-terns can be flexibly specified

RecGraphPatternNotTriples This query part closelyresembles the similar named ldquoGraphPatternNot-Triplesrdquo of the SPARQL 11 specification [21]The only difference to SPARQL is that the Rec-GraphPatternNotTriples section is a little morerestricted in terms of admissible graph patternsthan the ldquoGraphPatternNotTriplesrdquo section of theregular SPARQL specification For instance itdoes not allow to retrieve LOD resources otherthan from the default graph that is specified in theconfiguration of the SKOSRec engine since filterpatterns have to be applied to datasets that containSKOS annotations Hence they need to be speci-fied before runtime execution to ensure that sim-ilarity calculation can be carried out correctly Ifa user was able to formulate filter conditions thatreferred to different RDF graphs and endpointsaccordingly this would not be possible

RecGroupOrUnionGraphPattern This query sec-tion marks an alternative graph pattern in the styleof the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntaxspecification [21] However it neither allowsSPARQL-like subqueries nor federated queries

30 L Wenige et al Similarity-based Graph Queries

for filtered resources because this would compli-cate similarity calculation

RecOptionalGraphPattern In this part of a SKOS-RecQuery users are enabled to specify additionalgraph patterns which might extend the query so-lution but do not necessarily have to match thedata

RecMinusGraphPattern This section handles exclu-sion of LOD resources for cases when users wantto omit certain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

L Wenige et al Similarity-based Graph Queries 31

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM A

mobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige J Ruhland Flexible on-the-fly recommendationsfrom Linked Open Data repositories In International Confer-ence on Business Information Systems pp 43ndash54 2016

[63] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[64] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[65] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[66] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[67] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Future Work amp Conclusion
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 16: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge … · Semantic Web 0 (0) 1 1 IOS Press Similarity-based Knowledge Graph Queries for Recommendation Retrieval Lisa Wenige*,

16 L Wenige et al Similarity-based Graph Queries

divC =2

k(k minus 1)

sumlltm

d(il im) (22)

The content-based diversity scores (divC) were cal-culated in the backend of the web application andsaved in a log file for each generated recommenda-tion list of the SKOSRec engine during runtime execu-tion In addition to the measurement of divC scores weasked participants about their opinions on recommen-dation list diversity directly Following the RS evalua-tion framework by Pu et al [44] two Likert items wereposed for this purpose

ndash The recommendations of list [] are diversendash The recommendations of list [] are similar to

each other

Besides the above listed algorithmic performancemetrics the authors gathered assessments from usersregarding the overall usefulness of the suggestionsThis approach is also in line with the empirically veri-fied evaluation framework by Pu et al [44] which pro-poses to measure the psychometric construct of Per-ceived Usefulness to get a comprehensive understand-ing of user opinions on recommendation results Thusparticipants were instructed to state their agreement topositive statements regarding the relevance of the sug-gestions The statements were adapated to the speci-ficities of each usage scenario Fig 4 shows an exam-ple collection of Likert questions for the domain ofmusic RS

After participants had provided the required infor-mation their answers were saved in a log file and theywere navigated to test case 2 (TC2) which evaluatedthe SKOSRec enginersquos performance for constraint-based queries (similar to Q3 Sect 3) We tried to findout if the engine was able to improve user satisfactionthrough filter conditions Even though IR systems al-ready successfully apply facet filters on result sets theexperiments still had to prove whether LOD-enabledconstraints have the same effect on the Perceived Use-fulness of recommendations For the comparisons ofnon-filtered and filtered suggestions we applied awithin-subjects design in which participants had torate two recommendation lists resulting from regular(non-filtered) and filtered requests in TC2 We favoredthis setup over a between-subjects design where sub-jects would have been assigned to only one result listWhile the between-subjects design more closely re-sembles a real-world setting since it does not require

participants to interact with more than one approachat a time differences among subjects (eg regard-ing demographics or topic expertise) can potentiallyhave a considerable impact on quality judgments [19]Hence when applying a between-subjects design onedoes not know whether differences in performance arecaused by the approaches or by the variability amongparticipants Therefore researchers need to conductbetween-subjects experiments with a sufficient numberof study subjects to level out these differences Within-subjects studies on the other hand require fewer par-ticipants while still achieving the same level of sta-tistical power since between-subject variability is notan issue [30 31] Beel et al have shown that evensmall variations in an RS test setting could cause sub-stantial differences in performance outcomes [6] Tokeep the extent of variations even lower comparisonsbetween non-filter and constraint-based recommenda-tions were based on the same profile for each user inTC2 Therefore the user profile from TC1 served as astarting point for filtered retrieval The web applicationshowed the recently generated profiles to users andasked them to provide an additional filter conditionFig 5 depicts an example web form for a constraint-based recommendation request in the music domain Itcomprises the user profile and an additional text fieldwhere users stated their constraint

Afterwards the engine applied this filter conditionon the set of potential recommendation results be-fore similarity calculation started For each usage sce-nario different types of constraints were available Re-garding suitable filter options the author sought toachieve a balance between assisting users in finding fil-ter conditions that would adequately represent their in-formation needs while keeping the variability amongusers as low as possible Therefore the specificitiesof each scenario and the availability of the respectivedata sources were determined For instance for themultimedia domains (movie music and books) it wasassumed that users would like to filter recommenda-tions according to the specific genre of the item Fortu-nately the DBpedia dataset contains this kind of infor-mation for the music domain In this domain the genrecan be retrieved through the properties dbogenreHowever in the movie and book domains DBpediadoes not provide sufficient information Many LOD re-sources which represent books were not assigned agenre In the movie domain a genre property is miss-ing entirely and genre-specific information can of-ten only be found in the subject categories describ-ing the movie Hence in contrast to the experiment on

L Wenige et al Similarity-based Graph Queries 17

Fig 4 Evaluation screen for recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

music recommendations participants in the domainsof books and films did not filter their recommenda-tion lists by genre On the other hand the propertydcsubject was applied as recommendation filter ineach multimedia domain Since the subject propertyoften links to many informative features of multimediaitems (eg release period geographic information orcontent information) the author decided that this fea-ture represents a powerful filter dimensionIn the experiment on travel destination search the webinterface offered two filters a subject and a location-specific filter The subject filter enabled users to statetheir preferences regarding the characteristics of a lo-cation (eg being a nature reserve) The location-specific filter on the other hand gave participants theoption to specify where the desired destination shouldbe located11

The second fundamental research issue of TC2 con-cerned the question whether expressive constraints canboost recommendation quality even further Hence

11httpdbpediaorg

apart from executing a constraint-based workflow witha simple filter (ie a direct attribute of an item)the SKOSRec engine generated suggestions with thehelp of an expressive filter as well This second pro-cedure while still applying the same user constraintas the simple filter utilized an expressive graph pat-tern to include additional LOD resources in the resultset For instance in case a subject filter was selectedresult set expansion was facilitated by exploring thewider semantic space of the DBpedia category graphthrough a property path declaration in the graph pat-tern (eg skosbroader[2]) In the travel exper-iment an additional expressive filter retrieved place-related information (dulhasLocation) for location-specific subproperty relations that are not material-ized in DBpedia Hence the expressive graph pat-tern expanded the filter such that additional proper-ties were matched (eg dbocountry dbocity ordbostate) while still applying the specified userconstraint on the set of LOD resources Upon exe-cution of constraint-based recommendation requestsusers received two recommendation lists (resultingfrom simple and expressive filtering accordingly) in a

18 L Wenige et al Similarity-based Graph Queries

Fig 5 Web form for a contraint-based recommendation request (music domain)

separate evaluation screen Participants evaluated re-sult sets in the same manner as in TC1 For each resultset the screen contained an additional Likert item ask-ing subjects to give their agreement to the statementldquoThe filter has improved the recommendation resultsof listrdquo The sequence of suggestions resulting fromthe different methods was randomly assigned in eachuser session to avoid order effectsAfter study subjects had completed TC2 they werenavigated to test case 3 (TC3) In the TC3 section ofthe travel experiment participants could choose be-tween three options for rollup query patterns (simi-lar to Q6 Sect 3) Users were able to obtain recom-mendations for travel destinations based on POIs theyhad visited during the stay in another destination Theycould either select a city (option 1) a region (option 2)or a country (option 3) as destination type Upon en-tity type selection the SKOSRec engine generated ap-propriate suggestions Fig 6 shows the web form fromthe travel experiment that facilitated the formulation ofadvanced rollup requests in TC3

When users had selected an entity type stated theirfavorite travel destination and three belonging POIs

the application generated a SKOSRec query from theseparameters A simple on-the-fly request was sent tothe engine as well to retrieve baseline recommenda-tions with which the suggestions resulting from theadvanced request were compared at a later stage ofthe experiment The simple query only contained theuserrsquos favorite travel destination and the selected en-tity type option thus omitting information on the POIsDuring the travel experiment the engine processedboth the simple and the advanced query and producedtwo recommendation lists As in the previous parts ofthe experiment participants assessed the quality of theresult sets which appeared in random order on thescreenThe multimedia experiments also contained a sec-tion on rollup retrieval (TC3) As the advanced travelqueries the multimedia requests aggregated similarityscores of sublevel entities through summation whichthe engine joined with a postfilter In addition to thesefeatures the pattern contained a preference query partThis section identified a set of preferred items througha single user statement Participants either specifiedtheir favorite directoractor (music domain) music act

L Wenige et al Similarity-based Graph Queries 19

Fig 6 Travel - User profile generation TC3

(music domain) or author (book domain) and receivedrecommendations based on the features of the workscreated by the artist (similar to Q5 Sect 3) TheSKOSRec engine generated recommendations basedon the creative works of the favored artist Partici-pants also received suggestions from a simple on-the-fly query which computed results according to theartistrsquos characteristics As in the travel experiment rec-ommendations from the advanced rollup request werecompared with the on-the-fly query without letting par-ticipants know which approach they were assessingand through random assignment of result set positionsAfter participants of the multimedia experiments hadevaluated the results of TC3 they were guided to testcase 4 (TC4) which evaluated cross-domain queriesThe experiment on travel RS ended with TC3 becauseno suitable graph patterns could be identified to fa-cilitate these kinds of suggestions for the travel usagescenario In the multimedia domain however the datafrom DBpedia was sufficient for this retrieval task Theentity type of the study (ie movie music act or book)defined the source domain based on which the enginegenerated suggestions for items from another multime-dia target domain (similar to Q7 Sect 3) Fig 7 de-picts the cross-domain web form of the music experi-ment

The web form assisted subjects in formulating a re-quest which then obtained movie suggestions basedon the favorite music act of the user Parameters fromusers were entered in the placeholders of the querypattern in the backend of the application We formu-lated the query templates in such a way that the queryengine matched any properties or graph patterns thatcould potentially connect two items from the specified

target and source domains For instance such match-ings can occur when a user states his favorite moviethat is written by a particular author In case the sameauthor has also written some books they might be ofinterest to the user as well Another example stemsfrom the movie domain Suppose in hisher profile aconsumer has declared a preference for a movie thatlinks to the corresponding soundtrack It might wellbe the case that the user likes the soundtrack and themusic acts that were involved in creating it just asmuch as she likes the movie The practice to distin-guish homonymous LOD resources through the prop-erty dbowikiPageDisambiguates is another in-teresting linking pattern from DBpedia that was ex-ploited for cross-domain requests in TC4 since it con-nects similar entities (ie the book edition of a certainmovie)Cross-domain SPARQL queries (similar to Q8 Sect3) were posed as baseline requests in this test caseWhile the SPARQL query only matched cross-domainrelations for a single LOD resource the SKOSRec re-quest expanded the search space by considering linksfrom more than one item As in the previous test casesTC4 ended with an evaluation screen where users as-sessed the SPARQL-based as well as the SKOSRecsuggestions in random order without knowing whichrecommendation list belonged to which method Table10 gives an overview of the test cases and the meth-ods that were applied in the experiments in each usagescenario

42 Results

In total 257 people took part in the studyThe gath-ered samples from the experiments contained answers

20 L Wenige et al Similarity-based Graph Queries

Fig 7 Music - User profile generation TC4 (page 9)

Table 10Test Cases in the web experiments

Test Case Evaluation Purpose Query Type Travel Movie Music BookTC1 On-the-fly Recommendations (Baseline) Q1 X X X X

TC2 Constraint-based Recommendations Q3 X X X X

TC3 Advanced Queries (Rollup) Q5 Q6 X X X X

TC4 Advanced Queries (Cross-Domain) Q7 x X X X

from both genders with a slight overrepresentation ofmale participants (537) The prevailing number ofstudy subjects was between 20 and 40 years of age(658) The evaluation of the demographics sectionalso revealed that the majority of participants alreadyperceives the search in their domain of interest as ei-ther ldquomanageablerdquo (385) or even as ldquoeasyrdquo or ldquoveryeasyrdquo (440)Prior to analyzing the performance results of the novelretrieval approaches of the SKOSRec engine it waschecked whether the respective Likert items for Per-ceived Usefulness and user-based diversity (divU)reached acceptable reliability levels For this purposeCronbachrsquos α values were determined Table 11 showsthe outcome of this analysis for the Perceived Useful-ness construct Given the acceptable values through-out the domains (Cronbachsrsquo s α gt 07) it could betreated as an interval-scaled variable in the statisticaltests

Table 11Cronbachrsquos α values for the concept of Perceived Usefulness

Domain Cronbach αTravel 07002

Movie 08579

Music 08515

Book 08949

Besides usefulness user-based diversity scores werealso measured by multiple Likert items However

Cronbachrsquos α values for this concept did not reach ac-ceptable levels In order to avoid that valuable user as-sessments remained unused agreements to the state-ment bdquoThe recommendations of list [] are diverse ldquowere taken into account as an ordinal variable (divU)Based on these results subsequent tests could be car-ried out The first part of the analysis concerned theperformance of the on-the-fly retrieval method (base-line) of the SKOSRec engine in TC1 The outcomeof this analysis confirms that the on-the-fly approachrepresents a viable recommendation strategy Table 12shows the mean (M)12 standard deviation (S D) andnumber of participants (N) for each performance di-mension and metric It can be seen that users assessedthe enginersquos performance in almost each quality di-mension to be above mediocrity Aside from the posi-tive user evaluations with regard to Perceived Uselful-ness related quality aspects such as Accuracy Noveltyand Diversity received good assessments as well Forinstance each domain mean relevance score (mrs) washigher than 50 score points It means that participantswere often convinced of the relevance of the resultsGiven these positive user assessments with regard toitem relevance it seems to be justified to consider themean of result list sizes as an indicator of recall Theengine was able to generate the required number ofrecommendations in almost each test case (see Table

12In case of the user-based diversity score (divU) M denotes themedian because of the ordinal scale of the variable

L Wenige et al Similarity-based Graph Queries 21

12) The on-the-fly retrieval approach was also testedwith regard to the Novelty dimension Table 12 showsthe mean novelty scores (nv) for the baseline recom-mendation approach Novelty scores indicate that themajority of items in a result list were deemed to be newin the travel and book RS whereas in the experimentson movie and music RS only one third of the recom-mendation list was unfamiliar to users In contrast tothat content-based Diversity scores (divC) were ratherhigh in each domain Hence it can be concluded thatitems in a result set usually have a low rate of match-ing SKOS annotations In addition to content-basedscores user assessments of recommendation list Di-versity (divU) were also taken into account The me-dian level of agreement to the Diversity-related Lik-ert item stating that recommendations are diverse liedat 3 (ldquoneutralrdquo) in the multimedia domains and at 4(ldquoagreerdquo) in the travel domainIn summary the baseline approach achieved fairlygood performance results in TC1 The findings demon-strate that LOD-enabled recommendations can po-tentially enhance a userrsquos search experience Theyalso allow for subsequent tests of advanced retrievaltechniques which can be compared to this standardmethod

In the analysis of performance results from TC2(constraint-based recommendations) particular atten-tion was paid to participantsrsquo agreements with thestatement ldquoThe filter has improved the recommenda-tion results of the listrdquo Fig 8 shows the distribution ofdomain-wise agreement ratios On average they werefairly high Throughout the domains the vast majorityof participants either selected 5 (ldquostrongly agreerdquo) 4(ldquoagreerdquo) or 3 (ldquoneutralrdquo) Hence a filter often had apositive impact on results

The authors gathered responses to the questionon improvement for recommendation lists resultingfrom both the simple and the expressive filtering ap-proaches Hence they give clues about the generalperformance of constraint-based retrieval Howeversince this section of the study revealed the experimen-tal condition to participants (ie application of a fil-ter) answers to the Likert statement have to be inter-preted cautiously Results may be slightly biased asparticipants were able to guess the underlying agendaHowever these limitations do not exist for compar-isons between simple and expressive filters Since theweb application randomly assigned the list order par-ticipants did not know which approach they were as-sessing This enables us to take a closer look at the dif-ferences between the two filtering approaches It was

Table 12Performance results in TC1 (M = Mean S D = Standard DeviationN = No of participants)

Dimension (Metric)[Max Score]

Domain Results

M SD N

Accuracy (size) [10]

Travel 1000 000 103Movie 1000 000 50Music 1000 000 53Book 980 140 51

Accuracy (mrs) [1]

Travel 5649 000 103Movie 6225 1915 50Music 5570 010 53Book 5747 1555 50

Novelty (nv) [1]

Travel 060 034 101Movie 036 031 50Music 036 039 52Book 061 029 47

Diversity (divU) [5]

Travel 4 - 102Movie 3 - 50Music 3 - 53Book 3 - 50

Diversity (divC) [1]

Travel 080 016 103Movie 087 012 50Music 086 010 50Book 092 008 50

Usefulness [5]

Travel 334 085 103Movie 355 073 50Music 318 085 53Book 343 081 50

hypothesized that advanced filters improve recommen-dation quality as they may identify more relevant re-sources thereby overcoming problems of data qualityor data sparsity in LOD repositories Table 13 lists re-sponse rates of the two methods The rate measuresthe ratio of non-zero result sets among all result setsThroughout the domains recommendation lists result-ing from expressive filtering achieved higher responserates

Table 13Response Rates (TC2)

Domain Constr-based(Regular)

Constr-based(Expanded)

N

Travel 23 57 103

Movie 86 88 50

Music 72 75 53

Book 73 76 51

22 L Wenige et al Similarity-based Graph Queries

000

025

050

075

100

Digital Library Travel Movie Music Book

Domain

Per

cent

age

of A

gree

men

t

Degree of Agreement

1

2

3

4

5

The filter has improved the recommendations of the list

Fig 8 Participantsrsquo agreement with the statement that the filter has improved the result list (domain-wise) (TC2)

In addition to response rates user assessments andscores in the defined quality dimensions were com-pared Table 14 depicts the outcome of this analysisSignificantly higher scores are marked in bold figuresIn case statistical tests identified no meaningful varia-tions the results of the better performing approach areunderlined A-levels were adjusted with the false dis-covery rate (FDR) in order to decrease the probabilityof making a type I error due to multiple testing

According to mean scores expressive requestsgenerated more comprehensive recommendation lists(size) This finding is in line with the increased re-sponse rates shown in Table 13 However subse-quently conducted t-tests confirmed significant dif-ferences only for the travel experiment (t(102) =minus789 p lt 0001) In contrast precision scores werehigher in most of the domains (travel movie mu-sic) when the SKOSRec engine processed a simpleconstraint-based query However statistical tests didnot confirm any systematic differences The authorsapplied Wilcoxon tests for mrs scores and the remain-ing performance metrics because of the small samplesizes resulting from the low response rates of the sim-ple constraint-based approachRegarding Novelty none of the approaches was supe-rior In the Diversity dimension expressive filters gen-erated suggestions that were at least as topically di-versified as the results from simple constraint-basedretrieval Even though no significant differences were

identified between the two approaches increased meanscores (divU in the travel domain and divC in each do-main) indicate a slight superiority of expanded filterrequests in this quality dimension The same appliesto usefulness scores (mean values were higher in 3 outof 4 domains when expressive filtering was applied)However these statements are speculative at best be-cause statistical tests were not significantIn summary it can be concluded that expressive fil-ters improve recall potentially leading to a slight lossin precision scores (see Table 14) It may also be thecase that expanded user constraints diversify as wellas increase the usefulness of recommendation listsHowever these claims are unverified In contrast itis safe to say that expressive query patterns generateresults of at least the same quality as simple patternswhile increasing the number of relevant suggestionsThe few differences may be explained by the high Jac-card scores (JI) (see Table 15) of recommendation listsThroughout the domains result sets were much alikeGiven these similarities and the increases in list sizesfor expressive filters it is supposed that an expandedconstraint produces an enhanced version of the recom-mendation list resulting from simple filtering There-fore expanded filters should be the default setting in aconstraint-based retrieval context

In TC3 (ie evaluation of roll-up query patterns)the SKOSRec engine almost always generated non-empty recommendation lists (see Table 16) In the

L Wenige et al Similarity-based Graph Queries 23

Table 14Performance results in TC2 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Simple Expressive N

M SD M SD

Accuracy (size) [10]

Travel 117 270 452 462 103Movie 672 408 720 392 50Music 596 464 636 451 53Book 459 450 500 451 51

Accuracy (mrs) [1]

Travel 6673 2051 6534 2236 23Movie 6095 2298 6029 2279 43Music 6377 1605 6296 1644 34Book 5617 1818 5681 1681 37

Novelty (nv) [1]

Travel 048 041 061 049 21Movie 044 036 055 080 42Music 055 039 052 037 38Book 077 034 076 033 33

Diversity (divU) [5]

Travel 3 - 35 - 24Movie 4 - 4 - 43Music 4 - 4 - 38Book 3 - 3 - 37

Diversity (divC) [1]

Travel 068 018 072 021 19Movie 070 024 073 023 41Music 086 013 087 012 34Book 084 024 095 063 29

Usefulness [5]

Travel 333 093 350 077 24Movie 332 085 331 086 43Music 355 081 364 078 38Book 339 083 362 121 37

Table 15Jaccard Indices (TC2)

Domain Mean Jaccard Index (JI) SD N

Travel 057 047 25

Movie 071 038 43

Music 082 035 37

Book 092 026 35

travel and the book experiment the response rate washigher for aggregation-based queries whereas in themovie domain the engine more often provided recom-mendations for regular requests However these dif-ferences are only marginal and do not indicate a clearsuperiority of one method

Fig 9 depicts participants general agreement to Lik-ert statements concerning the Perceived Usefulness ofthe approaches in TC3 The diagram shows similarresults for regular as well as advanced requests Onaverage satisfaction scores reached levels which layslightly above neutral agreement A subsequent t-test

Table 16Response Rates (TC3)

Domain Regular Aggr-based N

Travel 99 100 103

Movie 96 92 50

Music 100 100 53

Book 96 100 51

confirmed no significant differences between the twomethods In fact the approach with the highest aver-age score of Perceived Usefulness was different in eachexperiment While mean values were higher for regu-lar requests in the studies on travel and music RS ad-vanced queries produced higher scores in the movieand book experiments

Content-based Diversity (divC) was the only met-ric with systematic differences Here scores went upfor advanced retrieval patterns in the travel (t(98) =minus550 p lt 0001) the music (t(52) = minus451 p lt

0001) and the book domain (t(47) = minus490 p lt

24 L Wenige et al Similarity-based Graph Queries

0

1

2

3

Travel Movie Music Book

Domain

Agr

eem

ent

Method

Regular

Rollminusup

Perceived Usefulness (Regular vs Rollminusup Queries)

Fig 9 Participantrsquos overall satisfaction with regular and roll-up recommendation retrieval (TC3)

0001) (see Table 17) This finding is not surprisinggiven that the aggregation-based approach explores thewider semantic network of each item in the profile

Hence it can be assumed that both approaches pro-duce recommendation lists of similar quality and canprovide benefits to potential consumers Thus it can-not be clearly stated which approach is superior be-cause neither of them outperformed the other in eachdomain While the regular retrieval strategy was bet-ter in terms of usefulness in the travel and the mu-sic domains the aggregation-based approach achievedhigher scores in the domains of movie and book rec-ommendationsThis conclusion is especially impressive given thelow mean Jaccard scores for recommendation lists re-sulting from regular and aggregation-based retrieval(see Table 18) The mean score never exceeded 01

throughout the domains It indicates a low concor-dance between result sets It is remarkable that par-ticipants perceived the recommendations as equallygood given the high dissimilarity of the lists Thusadvanced queries have an added-value as they provideadditional interesting results They can help to exploreLOD repositories in different ways than a regular re-trieval approach Therefore it is worthwhile to offeraggregation-based query patterns as an alternative re-trieval strategy when the regular approach has not pro-duced any helpful recommendations

In addition to TC3 user assessments for TC4 (cross-domain recommendations) were also evaluated How-ever since the authors conducted experiments for TC4only in the multimedia domains the samples were con-siderably smaller Nevertheless it is assumed that theamount of completed user sessions is only just enoughto conduct further statistical analyses Table 19 showsthe response rates for the two retrieval methods Cross-domain patterns had a higher success rate of generat-ing non-zero result sets throughout the domains Par-ticipants almost always received a recommendation forthis query type whereas in case a regular SPARQLquery was executed often they did not receive a singlesuggestion at all

We applied non-parametric tests for the perfor-mance metrics mrs nv divU divC and PerceivedUsefulness due to the low response rates of theSPARQL-based approach which led to a low num-ber of comparable data points accordingly Overall theFDR-adjusted pairwise Wilcoxon tests detected no sig-nificant differences except for user-based Diversityscores (divU) in the movie (Z = minus269 p lt 005) andthe book domain (Z = minus256 p lt 005) The resultsfor the content-based Diversity (divC) metric seem topoint in the same direction because of the high meanvalues for cross-domain queries in each domain Butthe significance of these results was not verified Themrs scores indicate a better performance of regular

L Wenige et al Similarity-based Graph Queries 25

Table 17Performance results in TC3 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular Aggr-based N

M SD M SD

Accuracy (size) [10]

Travel 990 099 942 207 103Movie 288 059 276 082 50Music 1000 000 1000 000 53Book 288 059 300 000 51

Accuracy (mrs) [1]

Travel 4821 2498 4493 2559 102Movie 5458 2288 5669 2542 44Music 5395 1831 5824 1619 52Book 5153 2390 4925 2230 49

Novelty (nv) [1]

Travel 043 039 038 040 98Movie 035 038 028 041 41Music 049 036 046 039 50Book 061 042 062 044 47

Diversity (divU) [5]

Travel 35 - 3 - 100Movie 4 - 3 - 43Music 4 - 3 - 50Book 3 - 3 - 48

Diversity (divC) [1]

Travel 079 017 089 013 99Movie 075 020 066 027 44Music 083 017 093 007 53Book 074 027 093 006 48

Usefulness [5]

Travel 336 077 322 074 101Movie 320 092 326 094 44Music 345 084 344 085 51Book 295 087 303 096 48

Table 18Jaccard Indices (TC3)

Domain Mean Jaccard Index (JI) SD N

Travel 007 011 101

Movie 003 007 44

Music 002 005 53

Book 002 006 50

Table 19Response rates (TC4)

Domain Regular (SPARQL) SKOSRec N

Movie 32 96 50

Music 62 98 53

Book 53 100 51

SPARQL queries but the within-subjects test did notconfirm this assumption statistically (see Table 20)

However the most important finding of the anal-ysis was that SKOSRec cross-domain queries gener-ated significantly more suggestions throughout the do-

mains (movie t(49) = minus1792 p lt 0001 mu-sic t(52) = minus2573 p lt 0001 book t(50) =minus1275 p lt 0001) (see Table 20) Fig 10 illustratesthis graphically

Whenever the SPARQL request provided recom-mendations which was the case in less than half ofthe user sessions (see Table 19) the SKOSRec sug-gestions were assessed to be of almost equal qualityas the SPARQL recommendations (see Table 20 Use-fulness) Hence when SPARQL querying did not pro-duce any results in more than the other half of the testcases the SKOSRec approach may still have provideduseful suggestions

Therefore it is reasonable to assume that the capa-bility of the SKOSRec engine to flexibly switch be-tween processing steps of similarity calculation andpattern matching can facilitate an improved explo-ration of LOD repositories Table 21 also shows thatthe items in the two sets often did not match as isindicated by low Jaccard indices This finding furtherdemonstrates that the execution of an additional step

26 L Wenige et al Similarity-based Graph Queries

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

00

25

50

75

100

Movie Music Book

Domain

No

of i

tem

s

Method

Regular

CrossminusDomain

Recommendation List Size

Fig 10 Mean recommendation list sizes for SPARQL and SKOSRec cross-domain requests (TC4)

L Wenige et al Similarity-based Graph Queries 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

of similarity calculation can help to retrieve other rel-evant items

5 Future Work amp Conclusion

This paper has presented the query-based SKOS-Rec engine that facilitates new types of recommen-dation requests for LOD repositories The evaluationshave shown that the SKOSRec system can often gener-ate relevant and useful suggestions when certain querytemplates are utilized In many cases the applicationof expressive or advanced query patterns (ie in thecontext of constraint-based retrieval or cross-domainrequests) helped to significantly improve recall valueswhile still providing the same level of quality in theremaining performance dimensions Additionally ad-vanced recommendation requests can be used to gener-ate diversified recommendation lists in case users arenot satisfied with the results of regular recommenda-tion requests (ie roll-up or cross-domain retrieval vsregular requests) While not being superior in each do-main and test case is at least safe to say that the novelrecommendation approaches of the SKOSRec engineconsiderably extend common retrieval methods and atleast provide an alternative search strategy when con-ventional methods fail to provide useful results Addi-tionally the increase in diversity and recall is in linewith findings from previous research on LOD-enabledRS [45]It is a strength of the developed approaches that theyfacilitate combinations of graph-based and similarity-based retrieval at different stages of the recommenda-tion workflow This feature extends general search ca-pabilities for semantic networks It requires future re-search to investigate whether in addition to SKOS fur-ther RDF vocabularies can be used for ad-hoc item-to-item similarity computation to increase the range ofpotential usage scenarios since not every LOD datasetcontains SKOS annotations [53] The application ofadditional properties could also boost the recommen-dation quality of some query patterns (eg such as thequery types used in TC3) In this context it also needsto be determined whether the representation of the user

profile in the SKOSRec query language can be substi-tuted with free-text expressions to enable search-likefunctionalities LOD researchers have already devel-oped engines that can process natural language queriesover RDF data [59] [34] [67] It will have to be inves-tigated how the SKOSRec engine profitably fits intothis landscape of existing retrieval toolsAnother open research question concerns the aspect ofinterfaces to other applications In its standard config-uration the system is not an end-user retrieval enginebut a backend application with which an administra-tor who has domain knowledge (eg of RDF vocab-ularies and data models of a particular LOD reposi-tory and usage scenario) creates the respective querypatterns Although the websites of the online experi-ments represent possible implementations of suitableend-user applications for the SKOSRec engine thereare still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selec-tion of LOD items for the profile was made possibleby previously storing the data in an index that couldthen be accessed and searched before issuing a recom-mendation query However this runs against the ideaof ad-hoc retrieval In this context future research willhave to determine whether it is feasible to generate anon-the-fly mapping from items to the respective LODresources In the case of a commercial application theengine needs to match items with the correspondingentries in the product catalog Even if it is not yet clearwhich UI components can make the best possible useof the SKOSRec enginersquos features the results from theuser experiments can give at least a few suggestionsThe following retrieval options should be the defaultsetting in an interface that contains a LOD-enabledRS component to assist users in finding interestingitems Since the regular approach of on-the-fly re-trieval achieved good results throughout the domainsthe display of simple recommendations generated frompreviously stated user preferences represents a suitablestarting point for retrieval On-the-fly queries could berefined by providing filter options Since the evalua-tions have shown that expressive constraints often leadto increased recall values the application of such afilter might be the best option for assisted retrievalAs in the preparations for the experiments appro-priate graph-based query patterns for the usage sce-nario in question would also have to be determinedby a domain expert before setting up the applicationThe same applies to advanced retrieval patterns whichshould be available to the end user to refine the querywhen both the simple as well as the constraint-based

28 L Wenige et al Similarity-based Graph Queries

approach have failed to provide relevant items Thissuggestion is made because roll-up queries were com-petitive with regular recommendations in the web-based experiments Therefore they represent a viablealternative retrieval strategy For some domains (egfor multimedia retrieval) it is also possible to usean optional selection field to enable the formulationof cross-domain requests Evaluations for this querytype suggest that users might receive interesting results(ie diversified and comprehensive recommendationlists) from such a requestIn summary the contributions of the paper are as fol-lows

ndash Development and implementation of novel re-trieval strategies in the SKOSRec system whichfacilitate personalized requests and leverage thepotential of knowledge graphs in order to addressthe research gap in state-of-the-art recommenda-tion methods with regard to customized queries

ndash Realization of web-based user experiments infour usage scenarios with the following out-comes

bull Successful demonstration that knowledge graphdata is applicable in the context of live recom-mendation scenarios

bull Creation of suitable query patterns that canbe applied to represent individual informationneeds in different application contexts The setof query patterns can be extended with the helpof the SKOSRec query language to meet therequirements of additional usage scenarios

bull Proof that the novel retrieval strategies can besuccessfully applied across various applicationcontexts and recommendation tasks and thatthey outperform conventional content-basedretrieval approaches especially in the perfor-mance dimensions of diversity and recall

The prototypical implementation and the evaluationof the SKOSRec engine in different usage scenarioshave demonstrated that the freely available knowledgesources on the LOD cloud can be successfully appliedto advance state-of-the-art methods in IR and recom-mender systems The proposed personalized and cus-tomized retrieval strategies can leverage the potentialof knowledge graphs by better matching informationneeds of users with the help of semantic search op-tions

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be com-prised of up to six elements of which the partsSimProjection and ItemPart are obligatory Henceusers need to state at least their preferencesand how many recommendations they wouldlike to receive Advanced retrieval patterns canbe formulated as well For instance similar re-source retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (Se-lectPart) of LOD resources As in SPARQL thequery can start with a prologue that abbreviatesIRIs with namespace declarations [21]

SelectPart The SelectPart part enables subqueryingwith recommendation results The surroundingquery is a SELECT query with a slightly simpli-fied WHERE clause (ie RecWhereClause) withwhich postfilter conditions can be formulatedIn this section users have the option to formu-late aggregation-based retrieval requests (Aggre-gation)

Aggregation This query part handles aggregation-based postfiltering Users specify the IRI (IRIref )resource for which similarity scores of sublevelentities are summarized Additionally it is statedhow the data should be aggregated (ie SUMMAX or AVG) as well as which variable (Var)represents the aggregation-based recommenda-tions in the postfilter section The compiler makessure that this variable is actually contained in theRecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refersto the list of generated suggestions It definesthe recommendation variable to which all simi-lar LOD resources are mapped In case pre- andpostfilter conditions are set in the other sectionsthe compiler checks whether the variable (Var)appears in these sections as well Otherwise therequired join operations cannot be executed Itis specified how many recommendations shouldbe displayed (Integer) and whether the requestwill be issued against a default repository or as across-repository request (ServiceIntegration)

ServiceIntegration This section indicates that a userintends to receive recommendations from a dif-ferent SPARQL endpoint than the default end-point that is stated in the standard configurationof the SKOSRec engine It specifies the targetSPARQL endpoint (IRIref ) from which a user

L Wenige et al Similarity-based Graph Queries 29

wants to receive recommendations Upon extrac-tion of SKOS annotations from the default sourceendpoint a subsequent request is sent to the spec-ified target endpoint

ItemPart This section of a SKOSRec query repre-sents a single preference in the user profile How-ever recommendation queries can contain a cou-ple of preference statements A preference is ei-ther expressed as a LOD resource (IRIref ) or asa variable which is bound to a preference query(VarPart) In the latter case the resources areobtained by matching the graph pattern in theVarPart with the triple statements in the reposi-tory The ItemPart statement can be additionallysupplemented with a concept-to-concept similar-ity threshold (Sim) that triggers concept expan-sion with the approach of flexible similarity de-tection which is extensively described in one ofour previous works [64]

VarPart The construct initiates preference queryingIndividual likings are expressed as a graph pat-tern in a RecWhereClause The variable (Var) isa placeholder for the LOD resources that will beused to set up the user profile The compiler hasto make sure that the specified variable occurs inthe RecWhereClause as well Otherwise the ex-traction of LOD resources cannot be carried outcorrectly

Sim The Sim part of a SKOSRec query denotes thethreshold of concept-to-concept scores for flexi-ble similarity detection Even though knowledge-based similarity values usually range between 0and 1 the specification allows a broader range ofdigits in the Decimal datatype to enable applica-tion of other similarity metrics in case they areneeded

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users canstate whether scores should be ldquolargerrdquo ldquolargerthanrdquo or ldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three dif-ferent sections of a SKOSRec query ie either inthe prefilter (ItemPart) the postfilter (SelectPart)or the preference filter section (VarPart) TheRecWhereClause closely resembles the ldquoWhere-Clauserdquo of the SPARQL 11 specification withall its subsequent parts [21] Hence it enablesdifferent combinations of graph pattern match-ing However the RecWhereClause of the SKOS-Rec language allows fewer pattern matching ex-pressions than the ldquoWhereClauserdquo of the regular

SPARQL syntax specification [21] For instanceit is neither permitted to formulate subqueries norto direct filter requests to more than one reposi-tory (see RecGraphPatternNotTriples for furtherexplanations) This modification has been madeto simplify similarity calculation After parsingthis section the compiler makes sure that recom-mendation preference or postfilter variables oc-cur at least once in the RecWhereClause to guar-antee that subsequent steps of the recommenda-tion workflow can be applied on existing LOD re-sources In case the RecWhereClause comprises aRecMinusGraphPattern it also has to be checkedthat the variable is not only contained in theMINUS part of the group

RecGroupGraphPattern A RecGroupGraphPatterncan contain one to many basic graph patternswhich can be differently combined according toRecGroupGraphPatternSub

RecGroupGraphPatternSub This section defines thegraph patterns that are applied to identify suit-able LOD resources User filters can be either ex-pressed as basic graph patterns (TriplesBlock) oras combinations of them (RecGraphPatternNot-Triples) The sequence of different kinds of pat-terns can be flexibly specified

RecGraphPatternNotTriples This query part closelyresembles the similar named ldquoGraphPatternNot-Triplesrdquo of the SPARQL 11 specification [21]The only difference to SPARQL is that the Rec-GraphPatternNotTriples section is a little morerestricted in terms of admissible graph patternsthan the ldquoGraphPatternNotTriplesrdquo section of theregular SPARQL specification For instance itdoes not allow to retrieve LOD resources otherthan from the default graph that is specified in theconfiguration of the SKOSRec engine since filterpatterns have to be applied to datasets that containSKOS annotations Hence they need to be speci-fied before runtime execution to ensure that sim-ilarity calculation can be carried out correctly Ifa user was able to formulate filter conditions thatreferred to different RDF graphs and endpointsaccordingly this would not be possible

RecGroupOrUnionGraphPattern This query sec-tion marks an alternative graph pattern in the styleof the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntaxspecification [21] However it neither allowsSPARQL-like subqueries nor federated queries

30 L Wenige et al Similarity-based Graph Queries

for filtered resources because this would compli-cate similarity calculation

RecOptionalGraphPattern In this part of a SKOS-RecQuery users are enabled to specify additionalgraph patterns which might extend the query so-lution but do not necessarily have to match thedata

RecMinusGraphPattern This section handles exclu-sion of LOD resources for cases when users wantto omit certain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

L Wenige et al Similarity-based Graph Queries 31

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM A

mobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige J Ruhland Flexible on-the-fly recommendationsfrom Linked Open Data repositories In International Confer-ence on Business Information Systems pp 43ndash54 2016

[63] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[64] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[65] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[66] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[67] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Future Work amp Conclusion
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 17: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge … · Semantic Web 0 (0) 1 1 IOS Press Similarity-based Knowledge Graph Queries for Recommendation Retrieval Lisa Wenige*,

L Wenige et al Similarity-based Graph Queries 17

Fig 4 Evaluation screen for recommendations resulting from on-the-fly recommendation retrieval TC1 (music domain)

music recommendations participants in the domainsof books and films did not filter their recommenda-tion lists by genre On the other hand the propertydcsubject was applied as recommendation filter ineach multimedia domain Since the subject propertyoften links to many informative features of multimediaitems (eg release period geographic information orcontent information) the author decided that this fea-ture represents a powerful filter dimensionIn the experiment on travel destination search the webinterface offered two filters a subject and a location-specific filter The subject filter enabled users to statetheir preferences regarding the characteristics of a lo-cation (eg being a nature reserve) The location-specific filter on the other hand gave participants theoption to specify where the desired destination shouldbe located11

The second fundamental research issue of TC2 con-cerned the question whether expressive constraints canboost recommendation quality even further Hence

11httpdbpediaorg

apart from executing a constraint-based workflow witha simple filter (ie a direct attribute of an item)the SKOSRec engine generated suggestions with thehelp of an expressive filter as well This second pro-cedure while still applying the same user constraintas the simple filter utilized an expressive graph pat-tern to include additional LOD resources in the resultset For instance in case a subject filter was selectedresult set expansion was facilitated by exploring thewider semantic space of the DBpedia category graphthrough a property path declaration in the graph pat-tern (eg skosbroader[2]) In the travel exper-iment an additional expressive filter retrieved place-related information (dulhasLocation) for location-specific subproperty relations that are not material-ized in DBpedia Hence the expressive graph pat-tern expanded the filter such that additional proper-ties were matched (eg dbocountry dbocity ordbostate) while still applying the specified userconstraint on the set of LOD resources Upon exe-cution of constraint-based recommendation requestsusers received two recommendation lists (resultingfrom simple and expressive filtering accordingly) in a

18 L Wenige et al Similarity-based Graph Queries

Fig 5 Web form for a contraint-based recommendation request (music domain)

separate evaluation screen Participants evaluated re-sult sets in the same manner as in TC1 For each resultset the screen contained an additional Likert item ask-ing subjects to give their agreement to the statementldquoThe filter has improved the recommendation resultsof listrdquo The sequence of suggestions resulting fromthe different methods was randomly assigned in eachuser session to avoid order effectsAfter study subjects had completed TC2 they werenavigated to test case 3 (TC3) In the TC3 section ofthe travel experiment participants could choose be-tween three options for rollup query patterns (simi-lar to Q6 Sect 3) Users were able to obtain recom-mendations for travel destinations based on POIs theyhad visited during the stay in another destination Theycould either select a city (option 1) a region (option 2)or a country (option 3) as destination type Upon en-tity type selection the SKOSRec engine generated ap-propriate suggestions Fig 6 shows the web form fromthe travel experiment that facilitated the formulation ofadvanced rollup requests in TC3

When users had selected an entity type stated theirfavorite travel destination and three belonging POIs

the application generated a SKOSRec query from theseparameters A simple on-the-fly request was sent tothe engine as well to retrieve baseline recommenda-tions with which the suggestions resulting from theadvanced request were compared at a later stage ofthe experiment The simple query only contained theuserrsquos favorite travel destination and the selected en-tity type option thus omitting information on the POIsDuring the travel experiment the engine processedboth the simple and the advanced query and producedtwo recommendation lists As in the previous parts ofthe experiment participants assessed the quality of theresult sets which appeared in random order on thescreenThe multimedia experiments also contained a sec-tion on rollup retrieval (TC3) As the advanced travelqueries the multimedia requests aggregated similarityscores of sublevel entities through summation whichthe engine joined with a postfilter In addition to thesefeatures the pattern contained a preference query partThis section identified a set of preferred items througha single user statement Participants either specifiedtheir favorite directoractor (music domain) music act

L Wenige et al Similarity-based Graph Queries 19

Fig 6 Travel - User profile generation TC3

(music domain) or author (book domain) and receivedrecommendations based on the features of the workscreated by the artist (similar to Q5 Sect 3) TheSKOSRec engine generated recommendations basedon the creative works of the favored artist Partici-pants also received suggestions from a simple on-the-fly query which computed results according to theartistrsquos characteristics As in the travel experiment rec-ommendations from the advanced rollup request werecompared with the on-the-fly query without letting par-ticipants know which approach they were assessingand through random assignment of result set positionsAfter participants of the multimedia experiments hadevaluated the results of TC3 they were guided to testcase 4 (TC4) which evaluated cross-domain queriesThe experiment on travel RS ended with TC3 becauseno suitable graph patterns could be identified to fa-cilitate these kinds of suggestions for the travel usagescenario In the multimedia domain however the datafrom DBpedia was sufficient for this retrieval task Theentity type of the study (ie movie music act or book)defined the source domain based on which the enginegenerated suggestions for items from another multime-dia target domain (similar to Q7 Sect 3) Fig 7 de-picts the cross-domain web form of the music experi-ment

The web form assisted subjects in formulating a re-quest which then obtained movie suggestions basedon the favorite music act of the user Parameters fromusers were entered in the placeholders of the querypattern in the backend of the application We formu-lated the query templates in such a way that the queryengine matched any properties or graph patterns thatcould potentially connect two items from the specified

target and source domains For instance such match-ings can occur when a user states his favorite moviethat is written by a particular author In case the sameauthor has also written some books they might be ofinterest to the user as well Another example stemsfrom the movie domain Suppose in hisher profile aconsumer has declared a preference for a movie thatlinks to the corresponding soundtrack It might wellbe the case that the user likes the soundtrack and themusic acts that were involved in creating it just asmuch as she likes the movie The practice to distin-guish homonymous LOD resources through the prop-erty dbowikiPageDisambiguates is another in-teresting linking pattern from DBpedia that was ex-ploited for cross-domain requests in TC4 since it con-nects similar entities (ie the book edition of a certainmovie)Cross-domain SPARQL queries (similar to Q8 Sect3) were posed as baseline requests in this test caseWhile the SPARQL query only matched cross-domainrelations for a single LOD resource the SKOSRec re-quest expanded the search space by considering linksfrom more than one item As in the previous test casesTC4 ended with an evaluation screen where users as-sessed the SPARQL-based as well as the SKOSRecsuggestions in random order without knowing whichrecommendation list belonged to which method Table10 gives an overview of the test cases and the meth-ods that were applied in the experiments in each usagescenario

42 Results

In total 257 people took part in the studyThe gath-ered samples from the experiments contained answers

20 L Wenige et al Similarity-based Graph Queries

Fig 7 Music - User profile generation TC4 (page 9)

Table 10Test Cases in the web experiments

Test Case Evaluation Purpose Query Type Travel Movie Music BookTC1 On-the-fly Recommendations (Baseline) Q1 X X X X

TC2 Constraint-based Recommendations Q3 X X X X

TC3 Advanced Queries (Rollup) Q5 Q6 X X X X

TC4 Advanced Queries (Cross-Domain) Q7 x X X X

from both genders with a slight overrepresentation ofmale participants (537) The prevailing number ofstudy subjects was between 20 and 40 years of age(658) The evaluation of the demographics sectionalso revealed that the majority of participants alreadyperceives the search in their domain of interest as ei-ther ldquomanageablerdquo (385) or even as ldquoeasyrdquo or ldquoveryeasyrdquo (440)Prior to analyzing the performance results of the novelretrieval approaches of the SKOSRec engine it waschecked whether the respective Likert items for Per-ceived Usefulness and user-based diversity (divU)reached acceptable reliability levels For this purposeCronbachrsquos α values were determined Table 11 showsthe outcome of this analysis for the Perceived Useful-ness construct Given the acceptable values through-out the domains (Cronbachsrsquo s α gt 07) it could betreated as an interval-scaled variable in the statisticaltests

Table 11Cronbachrsquos α values for the concept of Perceived Usefulness

Domain Cronbach αTravel 07002

Movie 08579

Music 08515

Book 08949

Besides usefulness user-based diversity scores werealso measured by multiple Likert items However

Cronbachrsquos α values for this concept did not reach ac-ceptable levels In order to avoid that valuable user as-sessments remained unused agreements to the state-ment bdquoThe recommendations of list [] are diverse ldquowere taken into account as an ordinal variable (divU)Based on these results subsequent tests could be car-ried out The first part of the analysis concerned theperformance of the on-the-fly retrieval method (base-line) of the SKOSRec engine in TC1 The outcomeof this analysis confirms that the on-the-fly approachrepresents a viable recommendation strategy Table 12shows the mean (M)12 standard deviation (S D) andnumber of participants (N) for each performance di-mension and metric It can be seen that users assessedthe enginersquos performance in almost each quality di-mension to be above mediocrity Aside from the posi-tive user evaluations with regard to Perceived Uselful-ness related quality aspects such as Accuracy Noveltyand Diversity received good assessments as well Forinstance each domain mean relevance score (mrs) washigher than 50 score points It means that participantswere often convinced of the relevance of the resultsGiven these positive user assessments with regard toitem relevance it seems to be justified to consider themean of result list sizes as an indicator of recall Theengine was able to generate the required number ofrecommendations in almost each test case (see Table

12In case of the user-based diversity score (divU) M denotes themedian because of the ordinal scale of the variable

L Wenige et al Similarity-based Graph Queries 21

12) The on-the-fly retrieval approach was also testedwith regard to the Novelty dimension Table 12 showsthe mean novelty scores (nv) for the baseline recom-mendation approach Novelty scores indicate that themajority of items in a result list were deemed to be newin the travel and book RS whereas in the experimentson movie and music RS only one third of the recom-mendation list was unfamiliar to users In contrast tothat content-based Diversity scores (divC) were ratherhigh in each domain Hence it can be concluded thatitems in a result set usually have a low rate of match-ing SKOS annotations In addition to content-basedscores user assessments of recommendation list Di-versity (divU) were also taken into account The me-dian level of agreement to the Diversity-related Lik-ert item stating that recommendations are diverse liedat 3 (ldquoneutralrdquo) in the multimedia domains and at 4(ldquoagreerdquo) in the travel domainIn summary the baseline approach achieved fairlygood performance results in TC1 The findings demon-strate that LOD-enabled recommendations can po-tentially enhance a userrsquos search experience Theyalso allow for subsequent tests of advanced retrievaltechniques which can be compared to this standardmethod

In the analysis of performance results from TC2(constraint-based recommendations) particular atten-tion was paid to participantsrsquo agreements with thestatement ldquoThe filter has improved the recommenda-tion results of the listrdquo Fig 8 shows the distribution ofdomain-wise agreement ratios On average they werefairly high Throughout the domains the vast majorityof participants either selected 5 (ldquostrongly agreerdquo) 4(ldquoagreerdquo) or 3 (ldquoneutralrdquo) Hence a filter often had apositive impact on results

The authors gathered responses to the questionon improvement for recommendation lists resultingfrom both the simple and the expressive filtering ap-proaches Hence they give clues about the generalperformance of constraint-based retrieval Howeversince this section of the study revealed the experimen-tal condition to participants (ie application of a fil-ter) answers to the Likert statement have to be inter-preted cautiously Results may be slightly biased asparticipants were able to guess the underlying agendaHowever these limitations do not exist for compar-isons between simple and expressive filters Since theweb application randomly assigned the list order par-ticipants did not know which approach they were as-sessing This enables us to take a closer look at the dif-ferences between the two filtering approaches It was

Table 12Performance results in TC1 (M = Mean S D = Standard DeviationN = No of participants)

Dimension (Metric)[Max Score]

Domain Results

M SD N

Accuracy (size) [10]

Travel 1000 000 103Movie 1000 000 50Music 1000 000 53Book 980 140 51

Accuracy (mrs) [1]

Travel 5649 000 103Movie 6225 1915 50Music 5570 010 53Book 5747 1555 50

Novelty (nv) [1]

Travel 060 034 101Movie 036 031 50Music 036 039 52Book 061 029 47

Diversity (divU) [5]

Travel 4 - 102Movie 3 - 50Music 3 - 53Book 3 - 50

Diversity (divC) [1]

Travel 080 016 103Movie 087 012 50Music 086 010 50Book 092 008 50

Usefulness [5]

Travel 334 085 103Movie 355 073 50Music 318 085 53Book 343 081 50

hypothesized that advanced filters improve recommen-dation quality as they may identify more relevant re-sources thereby overcoming problems of data qualityor data sparsity in LOD repositories Table 13 lists re-sponse rates of the two methods The rate measuresthe ratio of non-zero result sets among all result setsThroughout the domains recommendation lists result-ing from expressive filtering achieved higher responserates

Table 13Response Rates (TC2)

Domain Constr-based(Regular)

Constr-based(Expanded)

N

Travel 23 57 103

Movie 86 88 50

Music 72 75 53

Book 73 76 51

22 L Wenige et al Similarity-based Graph Queries

000

025

050

075

100

Digital Library Travel Movie Music Book

Domain

Per

cent

age

of A

gree

men

t

Degree of Agreement

1

2

3

4

5

The filter has improved the recommendations of the list

Fig 8 Participantsrsquo agreement with the statement that the filter has improved the result list (domain-wise) (TC2)

In addition to response rates user assessments andscores in the defined quality dimensions were com-pared Table 14 depicts the outcome of this analysisSignificantly higher scores are marked in bold figuresIn case statistical tests identified no meaningful varia-tions the results of the better performing approach areunderlined A-levels were adjusted with the false dis-covery rate (FDR) in order to decrease the probabilityof making a type I error due to multiple testing

According to mean scores expressive requestsgenerated more comprehensive recommendation lists(size) This finding is in line with the increased re-sponse rates shown in Table 13 However subse-quently conducted t-tests confirmed significant dif-ferences only for the travel experiment (t(102) =minus789 p lt 0001) In contrast precision scores werehigher in most of the domains (travel movie mu-sic) when the SKOSRec engine processed a simpleconstraint-based query However statistical tests didnot confirm any systematic differences The authorsapplied Wilcoxon tests for mrs scores and the remain-ing performance metrics because of the small samplesizes resulting from the low response rates of the sim-ple constraint-based approachRegarding Novelty none of the approaches was supe-rior In the Diversity dimension expressive filters gen-erated suggestions that were at least as topically di-versified as the results from simple constraint-basedretrieval Even though no significant differences were

identified between the two approaches increased meanscores (divU in the travel domain and divC in each do-main) indicate a slight superiority of expanded filterrequests in this quality dimension The same appliesto usefulness scores (mean values were higher in 3 outof 4 domains when expressive filtering was applied)However these statements are speculative at best be-cause statistical tests were not significantIn summary it can be concluded that expressive fil-ters improve recall potentially leading to a slight lossin precision scores (see Table 14) It may also be thecase that expanded user constraints diversify as wellas increase the usefulness of recommendation listsHowever these claims are unverified In contrast itis safe to say that expressive query patterns generateresults of at least the same quality as simple patternswhile increasing the number of relevant suggestionsThe few differences may be explained by the high Jac-card scores (JI) (see Table 15) of recommendation listsThroughout the domains result sets were much alikeGiven these similarities and the increases in list sizesfor expressive filters it is supposed that an expandedconstraint produces an enhanced version of the recom-mendation list resulting from simple filtering There-fore expanded filters should be the default setting in aconstraint-based retrieval context

In TC3 (ie evaluation of roll-up query patterns)the SKOSRec engine almost always generated non-empty recommendation lists (see Table 16) In the

L Wenige et al Similarity-based Graph Queries 23

Table 14Performance results in TC2 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Simple Expressive N

M SD M SD

Accuracy (size) [10]

Travel 117 270 452 462 103Movie 672 408 720 392 50Music 596 464 636 451 53Book 459 450 500 451 51

Accuracy (mrs) [1]

Travel 6673 2051 6534 2236 23Movie 6095 2298 6029 2279 43Music 6377 1605 6296 1644 34Book 5617 1818 5681 1681 37

Novelty (nv) [1]

Travel 048 041 061 049 21Movie 044 036 055 080 42Music 055 039 052 037 38Book 077 034 076 033 33

Diversity (divU) [5]

Travel 3 - 35 - 24Movie 4 - 4 - 43Music 4 - 4 - 38Book 3 - 3 - 37

Diversity (divC) [1]

Travel 068 018 072 021 19Movie 070 024 073 023 41Music 086 013 087 012 34Book 084 024 095 063 29

Usefulness [5]

Travel 333 093 350 077 24Movie 332 085 331 086 43Music 355 081 364 078 38Book 339 083 362 121 37

Table 15Jaccard Indices (TC2)

Domain Mean Jaccard Index (JI) SD N

Travel 057 047 25

Movie 071 038 43

Music 082 035 37

Book 092 026 35

travel and the book experiment the response rate washigher for aggregation-based queries whereas in themovie domain the engine more often provided recom-mendations for regular requests However these dif-ferences are only marginal and do not indicate a clearsuperiority of one method

Fig 9 depicts participants general agreement to Lik-ert statements concerning the Perceived Usefulness ofthe approaches in TC3 The diagram shows similarresults for regular as well as advanced requests Onaverage satisfaction scores reached levels which layslightly above neutral agreement A subsequent t-test

Table 16Response Rates (TC3)

Domain Regular Aggr-based N

Travel 99 100 103

Movie 96 92 50

Music 100 100 53

Book 96 100 51

confirmed no significant differences between the twomethods In fact the approach with the highest aver-age score of Perceived Usefulness was different in eachexperiment While mean values were higher for regu-lar requests in the studies on travel and music RS ad-vanced queries produced higher scores in the movieand book experiments

Content-based Diversity (divC) was the only met-ric with systematic differences Here scores went upfor advanced retrieval patterns in the travel (t(98) =minus550 p lt 0001) the music (t(52) = minus451 p lt

0001) and the book domain (t(47) = minus490 p lt

24 L Wenige et al Similarity-based Graph Queries

0

1

2

3

Travel Movie Music Book

Domain

Agr

eem

ent

Method

Regular

Rollminusup

Perceived Usefulness (Regular vs Rollminusup Queries)

Fig 9 Participantrsquos overall satisfaction with regular and roll-up recommendation retrieval (TC3)

0001) (see Table 17) This finding is not surprisinggiven that the aggregation-based approach explores thewider semantic network of each item in the profile

Hence it can be assumed that both approaches pro-duce recommendation lists of similar quality and canprovide benefits to potential consumers Thus it can-not be clearly stated which approach is superior be-cause neither of them outperformed the other in eachdomain While the regular retrieval strategy was bet-ter in terms of usefulness in the travel and the mu-sic domains the aggregation-based approach achievedhigher scores in the domains of movie and book rec-ommendationsThis conclusion is especially impressive given thelow mean Jaccard scores for recommendation lists re-sulting from regular and aggregation-based retrieval(see Table 18) The mean score never exceeded 01

throughout the domains It indicates a low concor-dance between result sets It is remarkable that par-ticipants perceived the recommendations as equallygood given the high dissimilarity of the lists Thusadvanced queries have an added-value as they provideadditional interesting results They can help to exploreLOD repositories in different ways than a regular re-trieval approach Therefore it is worthwhile to offeraggregation-based query patterns as an alternative re-trieval strategy when the regular approach has not pro-duced any helpful recommendations

In addition to TC3 user assessments for TC4 (cross-domain recommendations) were also evaluated How-ever since the authors conducted experiments for TC4only in the multimedia domains the samples were con-siderably smaller Nevertheless it is assumed that theamount of completed user sessions is only just enoughto conduct further statistical analyses Table 19 showsthe response rates for the two retrieval methods Cross-domain patterns had a higher success rate of generat-ing non-zero result sets throughout the domains Par-ticipants almost always received a recommendation forthis query type whereas in case a regular SPARQLquery was executed often they did not receive a singlesuggestion at all

We applied non-parametric tests for the perfor-mance metrics mrs nv divU divC and PerceivedUsefulness due to the low response rates of theSPARQL-based approach which led to a low num-ber of comparable data points accordingly Overall theFDR-adjusted pairwise Wilcoxon tests detected no sig-nificant differences except for user-based Diversityscores (divU) in the movie (Z = minus269 p lt 005) andthe book domain (Z = minus256 p lt 005) The resultsfor the content-based Diversity (divC) metric seem topoint in the same direction because of the high meanvalues for cross-domain queries in each domain Butthe significance of these results was not verified Themrs scores indicate a better performance of regular

L Wenige et al Similarity-based Graph Queries 25

Table 17Performance results in TC3 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular Aggr-based N

M SD M SD

Accuracy (size) [10]

Travel 990 099 942 207 103Movie 288 059 276 082 50Music 1000 000 1000 000 53Book 288 059 300 000 51

Accuracy (mrs) [1]

Travel 4821 2498 4493 2559 102Movie 5458 2288 5669 2542 44Music 5395 1831 5824 1619 52Book 5153 2390 4925 2230 49

Novelty (nv) [1]

Travel 043 039 038 040 98Movie 035 038 028 041 41Music 049 036 046 039 50Book 061 042 062 044 47

Diversity (divU) [5]

Travel 35 - 3 - 100Movie 4 - 3 - 43Music 4 - 3 - 50Book 3 - 3 - 48

Diversity (divC) [1]

Travel 079 017 089 013 99Movie 075 020 066 027 44Music 083 017 093 007 53Book 074 027 093 006 48

Usefulness [5]

Travel 336 077 322 074 101Movie 320 092 326 094 44Music 345 084 344 085 51Book 295 087 303 096 48

Table 18Jaccard Indices (TC3)

Domain Mean Jaccard Index (JI) SD N

Travel 007 011 101

Movie 003 007 44

Music 002 005 53

Book 002 006 50

Table 19Response rates (TC4)

Domain Regular (SPARQL) SKOSRec N

Movie 32 96 50

Music 62 98 53

Book 53 100 51

SPARQL queries but the within-subjects test did notconfirm this assumption statistically (see Table 20)

However the most important finding of the anal-ysis was that SKOSRec cross-domain queries gener-ated significantly more suggestions throughout the do-

mains (movie t(49) = minus1792 p lt 0001 mu-sic t(52) = minus2573 p lt 0001 book t(50) =minus1275 p lt 0001) (see Table 20) Fig 10 illustratesthis graphically

Whenever the SPARQL request provided recom-mendations which was the case in less than half ofthe user sessions (see Table 19) the SKOSRec sug-gestions were assessed to be of almost equal qualityas the SPARQL recommendations (see Table 20 Use-fulness) Hence when SPARQL querying did not pro-duce any results in more than the other half of the testcases the SKOSRec approach may still have provideduseful suggestions

Therefore it is reasonable to assume that the capa-bility of the SKOSRec engine to flexibly switch be-tween processing steps of similarity calculation andpattern matching can facilitate an improved explo-ration of LOD repositories Table 21 also shows thatthe items in the two sets often did not match as isindicated by low Jaccard indices This finding furtherdemonstrates that the execution of an additional step

26 L Wenige et al Similarity-based Graph Queries

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

00

25

50

75

100

Movie Music Book

Domain

No

of i

tem

s

Method

Regular

CrossminusDomain

Recommendation List Size

Fig 10 Mean recommendation list sizes for SPARQL and SKOSRec cross-domain requests (TC4)

L Wenige et al Similarity-based Graph Queries 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

of similarity calculation can help to retrieve other rel-evant items

5 Future Work amp Conclusion

This paper has presented the query-based SKOS-Rec engine that facilitates new types of recommen-dation requests for LOD repositories The evaluationshave shown that the SKOSRec system can often gener-ate relevant and useful suggestions when certain querytemplates are utilized In many cases the applicationof expressive or advanced query patterns (ie in thecontext of constraint-based retrieval or cross-domainrequests) helped to significantly improve recall valueswhile still providing the same level of quality in theremaining performance dimensions Additionally ad-vanced recommendation requests can be used to gener-ate diversified recommendation lists in case users arenot satisfied with the results of regular recommenda-tion requests (ie roll-up or cross-domain retrieval vsregular requests) While not being superior in each do-main and test case is at least safe to say that the novelrecommendation approaches of the SKOSRec engineconsiderably extend common retrieval methods and atleast provide an alternative search strategy when con-ventional methods fail to provide useful results Addi-tionally the increase in diversity and recall is in linewith findings from previous research on LOD-enabledRS [45]It is a strength of the developed approaches that theyfacilitate combinations of graph-based and similarity-based retrieval at different stages of the recommenda-tion workflow This feature extends general search ca-pabilities for semantic networks It requires future re-search to investigate whether in addition to SKOS fur-ther RDF vocabularies can be used for ad-hoc item-to-item similarity computation to increase the range ofpotential usage scenarios since not every LOD datasetcontains SKOS annotations [53] The application ofadditional properties could also boost the recommen-dation quality of some query patterns (eg such as thequery types used in TC3) In this context it also needsto be determined whether the representation of the user

profile in the SKOSRec query language can be substi-tuted with free-text expressions to enable search-likefunctionalities LOD researchers have already devel-oped engines that can process natural language queriesover RDF data [59] [34] [67] It will have to be inves-tigated how the SKOSRec engine profitably fits intothis landscape of existing retrieval toolsAnother open research question concerns the aspect ofinterfaces to other applications In its standard config-uration the system is not an end-user retrieval enginebut a backend application with which an administra-tor who has domain knowledge (eg of RDF vocab-ularies and data models of a particular LOD reposi-tory and usage scenario) creates the respective querypatterns Although the websites of the online experi-ments represent possible implementations of suitableend-user applications for the SKOSRec engine thereare still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selec-tion of LOD items for the profile was made possibleby previously storing the data in an index that couldthen be accessed and searched before issuing a recom-mendation query However this runs against the ideaof ad-hoc retrieval In this context future research willhave to determine whether it is feasible to generate anon-the-fly mapping from items to the respective LODresources In the case of a commercial application theengine needs to match items with the correspondingentries in the product catalog Even if it is not yet clearwhich UI components can make the best possible useof the SKOSRec enginersquos features the results from theuser experiments can give at least a few suggestionsThe following retrieval options should be the defaultsetting in an interface that contains a LOD-enabledRS component to assist users in finding interestingitems Since the regular approach of on-the-fly re-trieval achieved good results throughout the domainsthe display of simple recommendations generated frompreviously stated user preferences represents a suitablestarting point for retrieval On-the-fly queries could berefined by providing filter options Since the evalua-tions have shown that expressive constraints often leadto increased recall values the application of such afilter might be the best option for assisted retrievalAs in the preparations for the experiments appro-priate graph-based query patterns for the usage sce-nario in question would also have to be determinedby a domain expert before setting up the applicationThe same applies to advanced retrieval patterns whichshould be available to the end user to refine the querywhen both the simple as well as the constraint-based

28 L Wenige et al Similarity-based Graph Queries

approach have failed to provide relevant items Thissuggestion is made because roll-up queries were com-petitive with regular recommendations in the web-based experiments Therefore they represent a viablealternative retrieval strategy For some domains (egfor multimedia retrieval) it is also possible to usean optional selection field to enable the formulationof cross-domain requests Evaluations for this querytype suggest that users might receive interesting results(ie diversified and comprehensive recommendationlists) from such a requestIn summary the contributions of the paper are as fol-lows

ndash Development and implementation of novel re-trieval strategies in the SKOSRec system whichfacilitate personalized requests and leverage thepotential of knowledge graphs in order to addressthe research gap in state-of-the-art recommenda-tion methods with regard to customized queries

ndash Realization of web-based user experiments infour usage scenarios with the following out-comes

bull Successful demonstration that knowledge graphdata is applicable in the context of live recom-mendation scenarios

bull Creation of suitable query patterns that canbe applied to represent individual informationneeds in different application contexts The setof query patterns can be extended with the helpof the SKOSRec query language to meet therequirements of additional usage scenarios

bull Proof that the novel retrieval strategies can besuccessfully applied across various applicationcontexts and recommendation tasks and thatthey outperform conventional content-basedretrieval approaches especially in the perfor-mance dimensions of diversity and recall

The prototypical implementation and the evaluationof the SKOSRec engine in different usage scenarioshave demonstrated that the freely available knowledgesources on the LOD cloud can be successfully appliedto advance state-of-the-art methods in IR and recom-mender systems The proposed personalized and cus-tomized retrieval strategies can leverage the potentialof knowledge graphs by better matching informationneeds of users with the help of semantic search op-tions

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be com-prised of up to six elements of which the partsSimProjection and ItemPart are obligatory Henceusers need to state at least their preferencesand how many recommendations they wouldlike to receive Advanced retrieval patterns canbe formulated as well For instance similar re-source retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (Se-lectPart) of LOD resources As in SPARQL thequery can start with a prologue that abbreviatesIRIs with namespace declarations [21]

SelectPart The SelectPart part enables subqueryingwith recommendation results The surroundingquery is a SELECT query with a slightly simpli-fied WHERE clause (ie RecWhereClause) withwhich postfilter conditions can be formulatedIn this section users have the option to formu-late aggregation-based retrieval requests (Aggre-gation)

Aggregation This query part handles aggregation-based postfiltering Users specify the IRI (IRIref )resource for which similarity scores of sublevelentities are summarized Additionally it is statedhow the data should be aggregated (ie SUMMAX or AVG) as well as which variable (Var)represents the aggregation-based recommenda-tions in the postfilter section The compiler makessure that this variable is actually contained in theRecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refersto the list of generated suggestions It definesthe recommendation variable to which all simi-lar LOD resources are mapped In case pre- andpostfilter conditions are set in the other sectionsthe compiler checks whether the variable (Var)appears in these sections as well Otherwise therequired join operations cannot be executed Itis specified how many recommendations shouldbe displayed (Integer) and whether the requestwill be issued against a default repository or as across-repository request (ServiceIntegration)

ServiceIntegration This section indicates that a userintends to receive recommendations from a dif-ferent SPARQL endpoint than the default end-point that is stated in the standard configurationof the SKOSRec engine It specifies the targetSPARQL endpoint (IRIref ) from which a user

L Wenige et al Similarity-based Graph Queries 29

wants to receive recommendations Upon extrac-tion of SKOS annotations from the default sourceendpoint a subsequent request is sent to the spec-ified target endpoint

ItemPart This section of a SKOSRec query repre-sents a single preference in the user profile How-ever recommendation queries can contain a cou-ple of preference statements A preference is ei-ther expressed as a LOD resource (IRIref ) or asa variable which is bound to a preference query(VarPart) In the latter case the resources areobtained by matching the graph pattern in theVarPart with the triple statements in the reposi-tory The ItemPart statement can be additionallysupplemented with a concept-to-concept similar-ity threshold (Sim) that triggers concept expan-sion with the approach of flexible similarity de-tection which is extensively described in one ofour previous works [64]

VarPart The construct initiates preference queryingIndividual likings are expressed as a graph pat-tern in a RecWhereClause The variable (Var) isa placeholder for the LOD resources that will beused to set up the user profile The compiler hasto make sure that the specified variable occurs inthe RecWhereClause as well Otherwise the ex-traction of LOD resources cannot be carried outcorrectly

Sim The Sim part of a SKOSRec query denotes thethreshold of concept-to-concept scores for flexi-ble similarity detection Even though knowledge-based similarity values usually range between 0and 1 the specification allows a broader range ofdigits in the Decimal datatype to enable applica-tion of other similarity metrics in case they areneeded

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users canstate whether scores should be ldquolargerrdquo ldquolargerthanrdquo or ldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three dif-ferent sections of a SKOSRec query ie either inthe prefilter (ItemPart) the postfilter (SelectPart)or the preference filter section (VarPart) TheRecWhereClause closely resembles the ldquoWhere-Clauserdquo of the SPARQL 11 specification withall its subsequent parts [21] Hence it enablesdifferent combinations of graph pattern match-ing However the RecWhereClause of the SKOS-Rec language allows fewer pattern matching ex-pressions than the ldquoWhereClauserdquo of the regular

SPARQL syntax specification [21] For instanceit is neither permitted to formulate subqueries norto direct filter requests to more than one reposi-tory (see RecGraphPatternNotTriples for furtherexplanations) This modification has been madeto simplify similarity calculation After parsingthis section the compiler makes sure that recom-mendation preference or postfilter variables oc-cur at least once in the RecWhereClause to guar-antee that subsequent steps of the recommenda-tion workflow can be applied on existing LOD re-sources In case the RecWhereClause comprises aRecMinusGraphPattern it also has to be checkedthat the variable is not only contained in theMINUS part of the group

RecGroupGraphPattern A RecGroupGraphPatterncan contain one to many basic graph patternswhich can be differently combined according toRecGroupGraphPatternSub

RecGroupGraphPatternSub This section defines thegraph patterns that are applied to identify suit-able LOD resources User filters can be either ex-pressed as basic graph patterns (TriplesBlock) oras combinations of them (RecGraphPatternNot-Triples) The sequence of different kinds of pat-terns can be flexibly specified

RecGraphPatternNotTriples This query part closelyresembles the similar named ldquoGraphPatternNot-Triplesrdquo of the SPARQL 11 specification [21]The only difference to SPARQL is that the Rec-GraphPatternNotTriples section is a little morerestricted in terms of admissible graph patternsthan the ldquoGraphPatternNotTriplesrdquo section of theregular SPARQL specification For instance itdoes not allow to retrieve LOD resources otherthan from the default graph that is specified in theconfiguration of the SKOSRec engine since filterpatterns have to be applied to datasets that containSKOS annotations Hence they need to be speci-fied before runtime execution to ensure that sim-ilarity calculation can be carried out correctly Ifa user was able to formulate filter conditions thatreferred to different RDF graphs and endpointsaccordingly this would not be possible

RecGroupOrUnionGraphPattern This query sec-tion marks an alternative graph pattern in the styleof the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntaxspecification [21] However it neither allowsSPARQL-like subqueries nor federated queries

30 L Wenige et al Similarity-based Graph Queries

for filtered resources because this would compli-cate similarity calculation

RecOptionalGraphPattern In this part of a SKOS-RecQuery users are enabled to specify additionalgraph patterns which might extend the query so-lution but do not necessarily have to match thedata

RecMinusGraphPattern This section handles exclu-sion of LOD resources for cases when users wantto omit certain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

L Wenige et al Similarity-based Graph Queries 31

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM A

mobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige J Ruhland Flexible on-the-fly recommendationsfrom Linked Open Data repositories In International Confer-ence on Business Information Systems pp 43ndash54 2016

[63] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[64] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[65] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[66] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[67] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Future Work amp Conclusion
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 18: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge … · Semantic Web 0 (0) 1 1 IOS Press Similarity-based Knowledge Graph Queries for Recommendation Retrieval Lisa Wenige*,

18 L Wenige et al Similarity-based Graph Queries

Fig 5 Web form for a contraint-based recommendation request (music domain)

separate evaluation screen Participants evaluated re-sult sets in the same manner as in TC1 For each resultset the screen contained an additional Likert item ask-ing subjects to give their agreement to the statementldquoThe filter has improved the recommendation resultsof listrdquo The sequence of suggestions resulting fromthe different methods was randomly assigned in eachuser session to avoid order effectsAfter study subjects had completed TC2 they werenavigated to test case 3 (TC3) In the TC3 section ofthe travel experiment participants could choose be-tween three options for rollup query patterns (simi-lar to Q6 Sect 3) Users were able to obtain recom-mendations for travel destinations based on POIs theyhad visited during the stay in another destination Theycould either select a city (option 1) a region (option 2)or a country (option 3) as destination type Upon en-tity type selection the SKOSRec engine generated ap-propriate suggestions Fig 6 shows the web form fromthe travel experiment that facilitated the formulation ofadvanced rollup requests in TC3

When users had selected an entity type stated theirfavorite travel destination and three belonging POIs

the application generated a SKOSRec query from theseparameters A simple on-the-fly request was sent tothe engine as well to retrieve baseline recommenda-tions with which the suggestions resulting from theadvanced request were compared at a later stage ofthe experiment The simple query only contained theuserrsquos favorite travel destination and the selected en-tity type option thus omitting information on the POIsDuring the travel experiment the engine processedboth the simple and the advanced query and producedtwo recommendation lists As in the previous parts ofthe experiment participants assessed the quality of theresult sets which appeared in random order on thescreenThe multimedia experiments also contained a sec-tion on rollup retrieval (TC3) As the advanced travelqueries the multimedia requests aggregated similarityscores of sublevel entities through summation whichthe engine joined with a postfilter In addition to thesefeatures the pattern contained a preference query partThis section identified a set of preferred items througha single user statement Participants either specifiedtheir favorite directoractor (music domain) music act

L Wenige et al Similarity-based Graph Queries 19

Fig 6 Travel - User profile generation TC3

(music domain) or author (book domain) and receivedrecommendations based on the features of the workscreated by the artist (similar to Q5 Sect 3) TheSKOSRec engine generated recommendations basedon the creative works of the favored artist Partici-pants also received suggestions from a simple on-the-fly query which computed results according to theartistrsquos characteristics As in the travel experiment rec-ommendations from the advanced rollup request werecompared with the on-the-fly query without letting par-ticipants know which approach they were assessingand through random assignment of result set positionsAfter participants of the multimedia experiments hadevaluated the results of TC3 they were guided to testcase 4 (TC4) which evaluated cross-domain queriesThe experiment on travel RS ended with TC3 becauseno suitable graph patterns could be identified to fa-cilitate these kinds of suggestions for the travel usagescenario In the multimedia domain however the datafrom DBpedia was sufficient for this retrieval task Theentity type of the study (ie movie music act or book)defined the source domain based on which the enginegenerated suggestions for items from another multime-dia target domain (similar to Q7 Sect 3) Fig 7 de-picts the cross-domain web form of the music experi-ment

The web form assisted subjects in formulating a re-quest which then obtained movie suggestions basedon the favorite music act of the user Parameters fromusers were entered in the placeholders of the querypattern in the backend of the application We formu-lated the query templates in such a way that the queryengine matched any properties or graph patterns thatcould potentially connect two items from the specified

target and source domains For instance such match-ings can occur when a user states his favorite moviethat is written by a particular author In case the sameauthor has also written some books they might be ofinterest to the user as well Another example stemsfrom the movie domain Suppose in hisher profile aconsumer has declared a preference for a movie thatlinks to the corresponding soundtrack It might wellbe the case that the user likes the soundtrack and themusic acts that were involved in creating it just asmuch as she likes the movie The practice to distin-guish homonymous LOD resources through the prop-erty dbowikiPageDisambiguates is another in-teresting linking pattern from DBpedia that was ex-ploited for cross-domain requests in TC4 since it con-nects similar entities (ie the book edition of a certainmovie)Cross-domain SPARQL queries (similar to Q8 Sect3) were posed as baseline requests in this test caseWhile the SPARQL query only matched cross-domainrelations for a single LOD resource the SKOSRec re-quest expanded the search space by considering linksfrom more than one item As in the previous test casesTC4 ended with an evaluation screen where users as-sessed the SPARQL-based as well as the SKOSRecsuggestions in random order without knowing whichrecommendation list belonged to which method Table10 gives an overview of the test cases and the meth-ods that were applied in the experiments in each usagescenario

42 Results

In total 257 people took part in the studyThe gath-ered samples from the experiments contained answers

20 L Wenige et al Similarity-based Graph Queries

Fig 7 Music - User profile generation TC4 (page 9)

Table 10Test Cases in the web experiments

Test Case Evaluation Purpose Query Type Travel Movie Music BookTC1 On-the-fly Recommendations (Baseline) Q1 X X X X

TC2 Constraint-based Recommendations Q3 X X X X

TC3 Advanced Queries (Rollup) Q5 Q6 X X X X

TC4 Advanced Queries (Cross-Domain) Q7 x X X X

from both genders with a slight overrepresentation ofmale participants (537) The prevailing number ofstudy subjects was between 20 and 40 years of age(658) The evaluation of the demographics sectionalso revealed that the majority of participants alreadyperceives the search in their domain of interest as ei-ther ldquomanageablerdquo (385) or even as ldquoeasyrdquo or ldquoveryeasyrdquo (440)Prior to analyzing the performance results of the novelretrieval approaches of the SKOSRec engine it waschecked whether the respective Likert items for Per-ceived Usefulness and user-based diversity (divU)reached acceptable reliability levels For this purposeCronbachrsquos α values were determined Table 11 showsthe outcome of this analysis for the Perceived Useful-ness construct Given the acceptable values through-out the domains (Cronbachsrsquo s α gt 07) it could betreated as an interval-scaled variable in the statisticaltests

Table 11Cronbachrsquos α values for the concept of Perceived Usefulness

Domain Cronbach αTravel 07002

Movie 08579

Music 08515

Book 08949

Besides usefulness user-based diversity scores werealso measured by multiple Likert items However

Cronbachrsquos α values for this concept did not reach ac-ceptable levels In order to avoid that valuable user as-sessments remained unused agreements to the state-ment bdquoThe recommendations of list [] are diverse ldquowere taken into account as an ordinal variable (divU)Based on these results subsequent tests could be car-ried out The first part of the analysis concerned theperformance of the on-the-fly retrieval method (base-line) of the SKOSRec engine in TC1 The outcomeof this analysis confirms that the on-the-fly approachrepresents a viable recommendation strategy Table 12shows the mean (M)12 standard deviation (S D) andnumber of participants (N) for each performance di-mension and metric It can be seen that users assessedthe enginersquos performance in almost each quality di-mension to be above mediocrity Aside from the posi-tive user evaluations with regard to Perceived Uselful-ness related quality aspects such as Accuracy Noveltyand Diversity received good assessments as well Forinstance each domain mean relevance score (mrs) washigher than 50 score points It means that participantswere often convinced of the relevance of the resultsGiven these positive user assessments with regard toitem relevance it seems to be justified to consider themean of result list sizes as an indicator of recall Theengine was able to generate the required number ofrecommendations in almost each test case (see Table

12In case of the user-based diversity score (divU) M denotes themedian because of the ordinal scale of the variable

L Wenige et al Similarity-based Graph Queries 21

12) The on-the-fly retrieval approach was also testedwith regard to the Novelty dimension Table 12 showsthe mean novelty scores (nv) for the baseline recom-mendation approach Novelty scores indicate that themajority of items in a result list were deemed to be newin the travel and book RS whereas in the experimentson movie and music RS only one third of the recom-mendation list was unfamiliar to users In contrast tothat content-based Diversity scores (divC) were ratherhigh in each domain Hence it can be concluded thatitems in a result set usually have a low rate of match-ing SKOS annotations In addition to content-basedscores user assessments of recommendation list Di-versity (divU) were also taken into account The me-dian level of agreement to the Diversity-related Lik-ert item stating that recommendations are diverse liedat 3 (ldquoneutralrdquo) in the multimedia domains and at 4(ldquoagreerdquo) in the travel domainIn summary the baseline approach achieved fairlygood performance results in TC1 The findings demon-strate that LOD-enabled recommendations can po-tentially enhance a userrsquos search experience Theyalso allow for subsequent tests of advanced retrievaltechniques which can be compared to this standardmethod

In the analysis of performance results from TC2(constraint-based recommendations) particular atten-tion was paid to participantsrsquo agreements with thestatement ldquoThe filter has improved the recommenda-tion results of the listrdquo Fig 8 shows the distribution ofdomain-wise agreement ratios On average they werefairly high Throughout the domains the vast majorityof participants either selected 5 (ldquostrongly agreerdquo) 4(ldquoagreerdquo) or 3 (ldquoneutralrdquo) Hence a filter often had apositive impact on results

The authors gathered responses to the questionon improvement for recommendation lists resultingfrom both the simple and the expressive filtering ap-proaches Hence they give clues about the generalperformance of constraint-based retrieval Howeversince this section of the study revealed the experimen-tal condition to participants (ie application of a fil-ter) answers to the Likert statement have to be inter-preted cautiously Results may be slightly biased asparticipants were able to guess the underlying agendaHowever these limitations do not exist for compar-isons between simple and expressive filters Since theweb application randomly assigned the list order par-ticipants did not know which approach they were as-sessing This enables us to take a closer look at the dif-ferences between the two filtering approaches It was

Table 12Performance results in TC1 (M = Mean S D = Standard DeviationN = No of participants)

Dimension (Metric)[Max Score]

Domain Results

M SD N

Accuracy (size) [10]

Travel 1000 000 103Movie 1000 000 50Music 1000 000 53Book 980 140 51

Accuracy (mrs) [1]

Travel 5649 000 103Movie 6225 1915 50Music 5570 010 53Book 5747 1555 50

Novelty (nv) [1]

Travel 060 034 101Movie 036 031 50Music 036 039 52Book 061 029 47

Diversity (divU) [5]

Travel 4 - 102Movie 3 - 50Music 3 - 53Book 3 - 50

Diversity (divC) [1]

Travel 080 016 103Movie 087 012 50Music 086 010 50Book 092 008 50

Usefulness [5]

Travel 334 085 103Movie 355 073 50Music 318 085 53Book 343 081 50

hypothesized that advanced filters improve recommen-dation quality as they may identify more relevant re-sources thereby overcoming problems of data qualityor data sparsity in LOD repositories Table 13 lists re-sponse rates of the two methods The rate measuresthe ratio of non-zero result sets among all result setsThroughout the domains recommendation lists result-ing from expressive filtering achieved higher responserates

Table 13Response Rates (TC2)

Domain Constr-based(Regular)

Constr-based(Expanded)

N

Travel 23 57 103

Movie 86 88 50

Music 72 75 53

Book 73 76 51

22 L Wenige et al Similarity-based Graph Queries

000

025

050

075

100

Digital Library Travel Movie Music Book

Domain

Per

cent

age

of A

gree

men

t

Degree of Agreement

1

2

3

4

5

The filter has improved the recommendations of the list

Fig 8 Participantsrsquo agreement with the statement that the filter has improved the result list (domain-wise) (TC2)

In addition to response rates user assessments andscores in the defined quality dimensions were com-pared Table 14 depicts the outcome of this analysisSignificantly higher scores are marked in bold figuresIn case statistical tests identified no meaningful varia-tions the results of the better performing approach areunderlined A-levels were adjusted with the false dis-covery rate (FDR) in order to decrease the probabilityof making a type I error due to multiple testing

According to mean scores expressive requestsgenerated more comprehensive recommendation lists(size) This finding is in line with the increased re-sponse rates shown in Table 13 However subse-quently conducted t-tests confirmed significant dif-ferences only for the travel experiment (t(102) =minus789 p lt 0001) In contrast precision scores werehigher in most of the domains (travel movie mu-sic) when the SKOSRec engine processed a simpleconstraint-based query However statistical tests didnot confirm any systematic differences The authorsapplied Wilcoxon tests for mrs scores and the remain-ing performance metrics because of the small samplesizes resulting from the low response rates of the sim-ple constraint-based approachRegarding Novelty none of the approaches was supe-rior In the Diversity dimension expressive filters gen-erated suggestions that were at least as topically di-versified as the results from simple constraint-basedretrieval Even though no significant differences were

identified between the two approaches increased meanscores (divU in the travel domain and divC in each do-main) indicate a slight superiority of expanded filterrequests in this quality dimension The same appliesto usefulness scores (mean values were higher in 3 outof 4 domains when expressive filtering was applied)However these statements are speculative at best be-cause statistical tests were not significantIn summary it can be concluded that expressive fil-ters improve recall potentially leading to a slight lossin precision scores (see Table 14) It may also be thecase that expanded user constraints diversify as wellas increase the usefulness of recommendation listsHowever these claims are unverified In contrast itis safe to say that expressive query patterns generateresults of at least the same quality as simple patternswhile increasing the number of relevant suggestionsThe few differences may be explained by the high Jac-card scores (JI) (see Table 15) of recommendation listsThroughout the domains result sets were much alikeGiven these similarities and the increases in list sizesfor expressive filters it is supposed that an expandedconstraint produces an enhanced version of the recom-mendation list resulting from simple filtering There-fore expanded filters should be the default setting in aconstraint-based retrieval context

In TC3 (ie evaluation of roll-up query patterns)the SKOSRec engine almost always generated non-empty recommendation lists (see Table 16) In the

L Wenige et al Similarity-based Graph Queries 23

Table 14Performance results in TC2 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Simple Expressive N

M SD M SD

Accuracy (size) [10]

Travel 117 270 452 462 103Movie 672 408 720 392 50Music 596 464 636 451 53Book 459 450 500 451 51

Accuracy (mrs) [1]

Travel 6673 2051 6534 2236 23Movie 6095 2298 6029 2279 43Music 6377 1605 6296 1644 34Book 5617 1818 5681 1681 37

Novelty (nv) [1]

Travel 048 041 061 049 21Movie 044 036 055 080 42Music 055 039 052 037 38Book 077 034 076 033 33

Diversity (divU) [5]

Travel 3 - 35 - 24Movie 4 - 4 - 43Music 4 - 4 - 38Book 3 - 3 - 37

Diversity (divC) [1]

Travel 068 018 072 021 19Movie 070 024 073 023 41Music 086 013 087 012 34Book 084 024 095 063 29

Usefulness [5]

Travel 333 093 350 077 24Movie 332 085 331 086 43Music 355 081 364 078 38Book 339 083 362 121 37

Table 15Jaccard Indices (TC2)

Domain Mean Jaccard Index (JI) SD N

Travel 057 047 25

Movie 071 038 43

Music 082 035 37

Book 092 026 35

travel and the book experiment the response rate washigher for aggregation-based queries whereas in themovie domain the engine more often provided recom-mendations for regular requests However these dif-ferences are only marginal and do not indicate a clearsuperiority of one method

Fig 9 depicts participants general agreement to Lik-ert statements concerning the Perceived Usefulness ofthe approaches in TC3 The diagram shows similarresults for regular as well as advanced requests Onaverage satisfaction scores reached levels which layslightly above neutral agreement A subsequent t-test

Table 16Response Rates (TC3)

Domain Regular Aggr-based N

Travel 99 100 103

Movie 96 92 50

Music 100 100 53

Book 96 100 51

confirmed no significant differences between the twomethods In fact the approach with the highest aver-age score of Perceived Usefulness was different in eachexperiment While mean values were higher for regu-lar requests in the studies on travel and music RS ad-vanced queries produced higher scores in the movieand book experiments

Content-based Diversity (divC) was the only met-ric with systematic differences Here scores went upfor advanced retrieval patterns in the travel (t(98) =minus550 p lt 0001) the music (t(52) = minus451 p lt

0001) and the book domain (t(47) = minus490 p lt

24 L Wenige et al Similarity-based Graph Queries

0

1

2

3

Travel Movie Music Book

Domain

Agr

eem

ent

Method

Regular

Rollminusup

Perceived Usefulness (Regular vs Rollminusup Queries)

Fig 9 Participantrsquos overall satisfaction with regular and roll-up recommendation retrieval (TC3)

0001) (see Table 17) This finding is not surprisinggiven that the aggregation-based approach explores thewider semantic network of each item in the profile

Hence it can be assumed that both approaches pro-duce recommendation lists of similar quality and canprovide benefits to potential consumers Thus it can-not be clearly stated which approach is superior be-cause neither of them outperformed the other in eachdomain While the regular retrieval strategy was bet-ter in terms of usefulness in the travel and the mu-sic domains the aggregation-based approach achievedhigher scores in the domains of movie and book rec-ommendationsThis conclusion is especially impressive given thelow mean Jaccard scores for recommendation lists re-sulting from regular and aggregation-based retrieval(see Table 18) The mean score never exceeded 01

throughout the domains It indicates a low concor-dance between result sets It is remarkable that par-ticipants perceived the recommendations as equallygood given the high dissimilarity of the lists Thusadvanced queries have an added-value as they provideadditional interesting results They can help to exploreLOD repositories in different ways than a regular re-trieval approach Therefore it is worthwhile to offeraggregation-based query patterns as an alternative re-trieval strategy when the regular approach has not pro-duced any helpful recommendations

In addition to TC3 user assessments for TC4 (cross-domain recommendations) were also evaluated How-ever since the authors conducted experiments for TC4only in the multimedia domains the samples were con-siderably smaller Nevertheless it is assumed that theamount of completed user sessions is only just enoughto conduct further statistical analyses Table 19 showsthe response rates for the two retrieval methods Cross-domain patterns had a higher success rate of generat-ing non-zero result sets throughout the domains Par-ticipants almost always received a recommendation forthis query type whereas in case a regular SPARQLquery was executed often they did not receive a singlesuggestion at all

We applied non-parametric tests for the perfor-mance metrics mrs nv divU divC and PerceivedUsefulness due to the low response rates of theSPARQL-based approach which led to a low num-ber of comparable data points accordingly Overall theFDR-adjusted pairwise Wilcoxon tests detected no sig-nificant differences except for user-based Diversityscores (divU) in the movie (Z = minus269 p lt 005) andthe book domain (Z = minus256 p lt 005) The resultsfor the content-based Diversity (divC) metric seem topoint in the same direction because of the high meanvalues for cross-domain queries in each domain Butthe significance of these results was not verified Themrs scores indicate a better performance of regular

L Wenige et al Similarity-based Graph Queries 25

Table 17Performance results in TC3 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular Aggr-based N

M SD M SD

Accuracy (size) [10]

Travel 990 099 942 207 103Movie 288 059 276 082 50Music 1000 000 1000 000 53Book 288 059 300 000 51

Accuracy (mrs) [1]

Travel 4821 2498 4493 2559 102Movie 5458 2288 5669 2542 44Music 5395 1831 5824 1619 52Book 5153 2390 4925 2230 49

Novelty (nv) [1]

Travel 043 039 038 040 98Movie 035 038 028 041 41Music 049 036 046 039 50Book 061 042 062 044 47

Diversity (divU) [5]

Travel 35 - 3 - 100Movie 4 - 3 - 43Music 4 - 3 - 50Book 3 - 3 - 48

Diversity (divC) [1]

Travel 079 017 089 013 99Movie 075 020 066 027 44Music 083 017 093 007 53Book 074 027 093 006 48

Usefulness [5]

Travel 336 077 322 074 101Movie 320 092 326 094 44Music 345 084 344 085 51Book 295 087 303 096 48

Table 18Jaccard Indices (TC3)

Domain Mean Jaccard Index (JI) SD N

Travel 007 011 101

Movie 003 007 44

Music 002 005 53

Book 002 006 50

Table 19Response rates (TC4)

Domain Regular (SPARQL) SKOSRec N

Movie 32 96 50

Music 62 98 53

Book 53 100 51

SPARQL queries but the within-subjects test did notconfirm this assumption statistically (see Table 20)

However the most important finding of the anal-ysis was that SKOSRec cross-domain queries gener-ated significantly more suggestions throughout the do-

mains (movie t(49) = minus1792 p lt 0001 mu-sic t(52) = minus2573 p lt 0001 book t(50) =minus1275 p lt 0001) (see Table 20) Fig 10 illustratesthis graphically

Whenever the SPARQL request provided recom-mendations which was the case in less than half ofthe user sessions (see Table 19) the SKOSRec sug-gestions were assessed to be of almost equal qualityas the SPARQL recommendations (see Table 20 Use-fulness) Hence when SPARQL querying did not pro-duce any results in more than the other half of the testcases the SKOSRec approach may still have provideduseful suggestions

Therefore it is reasonable to assume that the capa-bility of the SKOSRec engine to flexibly switch be-tween processing steps of similarity calculation andpattern matching can facilitate an improved explo-ration of LOD repositories Table 21 also shows thatthe items in the two sets often did not match as isindicated by low Jaccard indices This finding furtherdemonstrates that the execution of an additional step

26 L Wenige et al Similarity-based Graph Queries

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

00

25

50

75

100

Movie Music Book

Domain

No

of i

tem

s

Method

Regular

CrossminusDomain

Recommendation List Size

Fig 10 Mean recommendation list sizes for SPARQL and SKOSRec cross-domain requests (TC4)

L Wenige et al Similarity-based Graph Queries 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

of similarity calculation can help to retrieve other rel-evant items

5 Future Work amp Conclusion

This paper has presented the query-based SKOS-Rec engine that facilitates new types of recommen-dation requests for LOD repositories The evaluationshave shown that the SKOSRec system can often gener-ate relevant and useful suggestions when certain querytemplates are utilized In many cases the applicationof expressive or advanced query patterns (ie in thecontext of constraint-based retrieval or cross-domainrequests) helped to significantly improve recall valueswhile still providing the same level of quality in theremaining performance dimensions Additionally ad-vanced recommendation requests can be used to gener-ate diversified recommendation lists in case users arenot satisfied with the results of regular recommenda-tion requests (ie roll-up or cross-domain retrieval vsregular requests) While not being superior in each do-main and test case is at least safe to say that the novelrecommendation approaches of the SKOSRec engineconsiderably extend common retrieval methods and atleast provide an alternative search strategy when con-ventional methods fail to provide useful results Addi-tionally the increase in diversity and recall is in linewith findings from previous research on LOD-enabledRS [45]It is a strength of the developed approaches that theyfacilitate combinations of graph-based and similarity-based retrieval at different stages of the recommenda-tion workflow This feature extends general search ca-pabilities for semantic networks It requires future re-search to investigate whether in addition to SKOS fur-ther RDF vocabularies can be used for ad-hoc item-to-item similarity computation to increase the range ofpotential usage scenarios since not every LOD datasetcontains SKOS annotations [53] The application ofadditional properties could also boost the recommen-dation quality of some query patterns (eg such as thequery types used in TC3) In this context it also needsto be determined whether the representation of the user

profile in the SKOSRec query language can be substi-tuted with free-text expressions to enable search-likefunctionalities LOD researchers have already devel-oped engines that can process natural language queriesover RDF data [59] [34] [67] It will have to be inves-tigated how the SKOSRec engine profitably fits intothis landscape of existing retrieval toolsAnother open research question concerns the aspect ofinterfaces to other applications In its standard config-uration the system is not an end-user retrieval enginebut a backend application with which an administra-tor who has domain knowledge (eg of RDF vocab-ularies and data models of a particular LOD reposi-tory and usage scenario) creates the respective querypatterns Although the websites of the online experi-ments represent possible implementations of suitableend-user applications for the SKOSRec engine thereare still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selec-tion of LOD items for the profile was made possibleby previously storing the data in an index that couldthen be accessed and searched before issuing a recom-mendation query However this runs against the ideaof ad-hoc retrieval In this context future research willhave to determine whether it is feasible to generate anon-the-fly mapping from items to the respective LODresources In the case of a commercial application theengine needs to match items with the correspondingentries in the product catalog Even if it is not yet clearwhich UI components can make the best possible useof the SKOSRec enginersquos features the results from theuser experiments can give at least a few suggestionsThe following retrieval options should be the defaultsetting in an interface that contains a LOD-enabledRS component to assist users in finding interestingitems Since the regular approach of on-the-fly re-trieval achieved good results throughout the domainsthe display of simple recommendations generated frompreviously stated user preferences represents a suitablestarting point for retrieval On-the-fly queries could berefined by providing filter options Since the evalua-tions have shown that expressive constraints often leadto increased recall values the application of such afilter might be the best option for assisted retrievalAs in the preparations for the experiments appro-priate graph-based query patterns for the usage sce-nario in question would also have to be determinedby a domain expert before setting up the applicationThe same applies to advanced retrieval patterns whichshould be available to the end user to refine the querywhen both the simple as well as the constraint-based

28 L Wenige et al Similarity-based Graph Queries

approach have failed to provide relevant items Thissuggestion is made because roll-up queries were com-petitive with regular recommendations in the web-based experiments Therefore they represent a viablealternative retrieval strategy For some domains (egfor multimedia retrieval) it is also possible to usean optional selection field to enable the formulationof cross-domain requests Evaluations for this querytype suggest that users might receive interesting results(ie diversified and comprehensive recommendationlists) from such a requestIn summary the contributions of the paper are as fol-lows

ndash Development and implementation of novel re-trieval strategies in the SKOSRec system whichfacilitate personalized requests and leverage thepotential of knowledge graphs in order to addressthe research gap in state-of-the-art recommenda-tion methods with regard to customized queries

ndash Realization of web-based user experiments infour usage scenarios with the following out-comes

bull Successful demonstration that knowledge graphdata is applicable in the context of live recom-mendation scenarios

bull Creation of suitable query patterns that canbe applied to represent individual informationneeds in different application contexts The setof query patterns can be extended with the helpof the SKOSRec query language to meet therequirements of additional usage scenarios

bull Proof that the novel retrieval strategies can besuccessfully applied across various applicationcontexts and recommendation tasks and thatthey outperform conventional content-basedretrieval approaches especially in the perfor-mance dimensions of diversity and recall

The prototypical implementation and the evaluationof the SKOSRec engine in different usage scenarioshave demonstrated that the freely available knowledgesources on the LOD cloud can be successfully appliedto advance state-of-the-art methods in IR and recom-mender systems The proposed personalized and cus-tomized retrieval strategies can leverage the potentialof knowledge graphs by better matching informationneeds of users with the help of semantic search op-tions

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be com-prised of up to six elements of which the partsSimProjection and ItemPart are obligatory Henceusers need to state at least their preferencesand how many recommendations they wouldlike to receive Advanced retrieval patterns canbe formulated as well For instance similar re-source retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (Se-lectPart) of LOD resources As in SPARQL thequery can start with a prologue that abbreviatesIRIs with namespace declarations [21]

SelectPart The SelectPart part enables subqueryingwith recommendation results The surroundingquery is a SELECT query with a slightly simpli-fied WHERE clause (ie RecWhereClause) withwhich postfilter conditions can be formulatedIn this section users have the option to formu-late aggregation-based retrieval requests (Aggre-gation)

Aggregation This query part handles aggregation-based postfiltering Users specify the IRI (IRIref )resource for which similarity scores of sublevelentities are summarized Additionally it is statedhow the data should be aggregated (ie SUMMAX or AVG) as well as which variable (Var)represents the aggregation-based recommenda-tions in the postfilter section The compiler makessure that this variable is actually contained in theRecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refersto the list of generated suggestions It definesthe recommendation variable to which all simi-lar LOD resources are mapped In case pre- andpostfilter conditions are set in the other sectionsthe compiler checks whether the variable (Var)appears in these sections as well Otherwise therequired join operations cannot be executed Itis specified how many recommendations shouldbe displayed (Integer) and whether the requestwill be issued against a default repository or as across-repository request (ServiceIntegration)

ServiceIntegration This section indicates that a userintends to receive recommendations from a dif-ferent SPARQL endpoint than the default end-point that is stated in the standard configurationof the SKOSRec engine It specifies the targetSPARQL endpoint (IRIref ) from which a user

L Wenige et al Similarity-based Graph Queries 29

wants to receive recommendations Upon extrac-tion of SKOS annotations from the default sourceendpoint a subsequent request is sent to the spec-ified target endpoint

ItemPart This section of a SKOSRec query repre-sents a single preference in the user profile How-ever recommendation queries can contain a cou-ple of preference statements A preference is ei-ther expressed as a LOD resource (IRIref ) or asa variable which is bound to a preference query(VarPart) In the latter case the resources areobtained by matching the graph pattern in theVarPart with the triple statements in the reposi-tory The ItemPart statement can be additionallysupplemented with a concept-to-concept similar-ity threshold (Sim) that triggers concept expan-sion with the approach of flexible similarity de-tection which is extensively described in one ofour previous works [64]

VarPart The construct initiates preference queryingIndividual likings are expressed as a graph pat-tern in a RecWhereClause The variable (Var) isa placeholder for the LOD resources that will beused to set up the user profile The compiler hasto make sure that the specified variable occurs inthe RecWhereClause as well Otherwise the ex-traction of LOD resources cannot be carried outcorrectly

Sim The Sim part of a SKOSRec query denotes thethreshold of concept-to-concept scores for flexi-ble similarity detection Even though knowledge-based similarity values usually range between 0and 1 the specification allows a broader range ofdigits in the Decimal datatype to enable applica-tion of other similarity metrics in case they areneeded

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users canstate whether scores should be ldquolargerrdquo ldquolargerthanrdquo or ldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three dif-ferent sections of a SKOSRec query ie either inthe prefilter (ItemPart) the postfilter (SelectPart)or the preference filter section (VarPart) TheRecWhereClause closely resembles the ldquoWhere-Clauserdquo of the SPARQL 11 specification withall its subsequent parts [21] Hence it enablesdifferent combinations of graph pattern match-ing However the RecWhereClause of the SKOS-Rec language allows fewer pattern matching ex-pressions than the ldquoWhereClauserdquo of the regular

SPARQL syntax specification [21] For instanceit is neither permitted to formulate subqueries norto direct filter requests to more than one reposi-tory (see RecGraphPatternNotTriples for furtherexplanations) This modification has been madeto simplify similarity calculation After parsingthis section the compiler makes sure that recom-mendation preference or postfilter variables oc-cur at least once in the RecWhereClause to guar-antee that subsequent steps of the recommenda-tion workflow can be applied on existing LOD re-sources In case the RecWhereClause comprises aRecMinusGraphPattern it also has to be checkedthat the variable is not only contained in theMINUS part of the group

RecGroupGraphPattern A RecGroupGraphPatterncan contain one to many basic graph patternswhich can be differently combined according toRecGroupGraphPatternSub

RecGroupGraphPatternSub This section defines thegraph patterns that are applied to identify suit-able LOD resources User filters can be either ex-pressed as basic graph patterns (TriplesBlock) oras combinations of them (RecGraphPatternNot-Triples) The sequence of different kinds of pat-terns can be flexibly specified

RecGraphPatternNotTriples This query part closelyresembles the similar named ldquoGraphPatternNot-Triplesrdquo of the SPARQL 11 specification [21]The only difference to SPARQL is that the Rec-GraphPatternNotTriples section is a little morerestricted in terms of admissible graph patternsthan the ldquoGraphPatternNotTriplesrdquo section of theregular SPARQL specification For instance itdoes not allow to retrieve LOD resources otherthan from the default graph that is specified in theconfiguration of the SKOSRec engine since filterpatterns have to be applied to datasets that containSKOS annotations Hence they need to be speci-fied before runtime execution to ensure that sim-ilarity calculation can be carried out correctly Ifa user was able to formulate filter conditions thatreferred to different RDF graphs and endpointsaccordingly this would not be possible

RecGroupOrUnionGraphPattern This query sec-tion marks an alternative graph pattern in the styleof the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntaxspecification [21] However it neither allowsSPARQL-like subqueries nor federated queries

30 L Wenige et al Similarity-based Graph Queries

for filtered resources because this would compli-cate similarity calculation

RecOptionalGraphPattern In this part of a SKOS-RecQuery users are enabled to specify additionalgraph patterns which might extend the query so-lution but do not necessarily have to match thedata

RecMinusGraphPattern This section handles exclu-sion of LOD resources for cases when users wantto omit certain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

L Wenige et al Similarity-based Graph Queries 31

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM A

mobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige J Ruhland Flexible on-the-fly recommendationsfrom Linked Open Data repositories In International Confer-ence on Business Information Systems pp 43ndash54 2016

[63] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[64] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[65] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[66] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[67] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Future Work amp Conclusion
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 19: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge … · Semantic Web 0 (0) 1 1 IOS Press Similarity-based Knowledge Graph Queries for Recommendation Retrieval Lisa Wenige*,

L Wenige et al Similarity-based Graph Queries 19

Fig 6 Travel - User profile generation TC3

(music domain) or author (book domain) and receivedrecommendations based on the features of the workscreated by the artist (similar to Q5 Sect 3) TheSKOSRec engine generated recommendations basedon the creative works of the favored artist Partici-pants also received suggestions from a simple on-the-fly query which computed results according to theartistrsquos characteristics As in the travel experiment rec-ommendations from the advanced rollup request werecompared with the on-the-fly query without letting par-ticipants know which approach they were assessingand through random assignment of result set positionsAfter participants of the multimedia experiments hadevaluated the results of TC3 they were guided to testcase 4 (TC4) which evaluated cross-domain queriesThe experiment on travel RS ended with TC3 becauseno suitable graph patterns could be identified to fa-cilitate these kinds of suggestions for the travel usagescenario In the multimedia domain however the datafrom DBpedia was sufficient for this retrieval task Theentity type of the study (ie movie music act or book)defined the source domain based on which the enginegenerated suggestions for items from another multime-dia target domain (similar to Q7 Sect 3) Fig 7 de-picts the cross-domain web form of the music experi-ment

The web form assisted subjects in formulating a re-quest which then obtained movie suggestions basedon the favorite music act of the user Parameters fromusers were entered in the placeholders of the querypattern in the backend of the application We formu-lated the query templates in such a way that the queryengine matched any properties or graph patterns thatcould potentially connect two items from the specified

target and source domains For instance such match-ings can occur when a user states his favorite moviethat is written by a particular author In case the sameauthor has also written some books they might be ofinterest to the user as well Another example stemsfrom the movie domain Suppose in hisher profile aconsumer has declared a preference for a movie thatlinks to the corresponding soundtrack It might wellbe the case that the user likes the soundtrack and themusic acts that were involved in creating it just asmuch as she likes the movie The practice to distin-guish homonymous LOD resources through the prop-erty dbowikiPageDisambiguates is another in-teresting linking pattern from DBpedia that was ex-ploited for cross-domain requests in TC4 since it con-nects similar entities (ie the book edition of a certainmovie)Cross-domain SPARQL queries (similar to Q8 Sect3) were posed as baseline requests in this test caseWhile the SPARQL query only matched cross-domainrelations for a single LOD resource the SKOSRec re-quest expanded the search space by considering linksfrom more than one item As in the previous test casesTC4 ended with an evaluation screen where users as-sessed the SPARQL-based as well as the SKOSRecsuggestions in random order without knowing whichrecommendation list belonged to which method Table10 gives an overview of the test cases and the meth-ods that were applied in the experiments in each usagescenario

42 Results

In total 257 people took part in the studyThe gath-ered samples from the experiments contained answers

20 L Wenige et al Similarity-based Graph Queries

Fig 7 Music - User profile generation TC4 (page 9)

Table 10Test Cases in the web experiments

Test Case Evaluation Purpose Query Type Travel Movie Music BookTC1 On-the-fly Recommendations (Baseline) Q1 X X X X

TC2 Constraint-based Recommendations Q3 X X X X

TC3 Advanced Queries (Rollup) Q5 Q6 X X X X

TC4 Advanced Queries (Cross-Domain) Q7 x X X X

from both genders with a slight overrepresentation ofmale participants (537) The prevailing number ofstudy subjects was between 20 and 40 years of age(658) The evaluation of the demographics sectionalso revealed that the majority of participants alreadyperceives the search in their domain of interest as ei-ther ldquomanageablerdquo (385) or even as ldquoeasyrdquo or ldquoveryeasyrdquo (440)Prior to analyzing the performance results of the novelretrieval approaches of the SKOSRec engine it waschecked whether the respective Likert items for Per-ceived Usefulness and user-based diversity (divU)reached acceptable reliability levels For this purposeCronbachrsquos α values were determined Table 11 showsthe outcome of this analysis for the Perceived Useful-ness construct Given the acceptable values through-out the domains (Cronbachsrsquo s α gt 07) it could betreated as an interval-scaled variable in the statisticaltests

Table 11Cronbachrsquos α values for the concept of Perceived Usefulness

Domain Cronbach αTravel 07002

Movie 08579

Music 08515

Book 08949

Besides usefulness user-based diversity scores werealso measured by multiple Likert items However

Cronbachrsquos α values for this concept did not reach ac-ceptable levels In order to avoid that valuable user as-sessments remained unused agreements to the state-ment bdquoThe recommendations of list [] are diverse ldquowere taken into account as an ordinal variable (divU)Based on these results subsequent tests could be car-ried out The first part of the analysis concerned theperformance of the on-the-fly retrieval method (base-line) of the SKOSRec engine in TC1 The outcomeof this analysis confirms that the on-the-fly approachrepresents a viable recommendation strategy Table 12shows the mean (M)12 standard deviation (S D) andnumber of participants (N) for each performance di-mension and metric It can be seen that users assessedthe enginersquos performance in almost each quality di-mension to be above mediocrity Aside from the posi-tive user evaluations with regard to Perceived Uselful-ness related quality aspects such as Accuracy Noveltyand Diversity received good assessments as well Forinstance each domain mean relevance score (mrs) washigher than 50 score points It means that participantswere often convinced of the relevance of the resultsGiven these positive user assessments with regard toitem relevance it seems to be justified to consider themean of result list sizes as an indicator of recall Theengine was able to generate the required number ofrecommendations in almost each test case (see Table

12In case of the user-based diversity score (divU) M denotes themedian because of the ordinal scale of the variable

L Wenige et al Similarity-based Graph Queries 21

12) The on-the-fly retrieval approach was also testedwith regard to the Novelty dimension Table 12 showsthe mean novelty scores (nv) for the baseline recom-mendation approach Novelty scores indicate that themajority of items in a result list were deemed to be newin the travel and book RS whereas in the experimentson movie and music RS only one third of the recom-mendation list was unfamiliar to users In contrast tothat content-based Diversity scores (divC) were ratherhigh in each domain Hence it can be concluded thatitems in a result set usually have a low rate of match-ing SKOS annotations In addition to content-basedscores user assessments of recommendation list Di-versity (divU) were also taken into account The me-dian level of agreement to the Diversity-related Lik-ert item stating that recommendations are diverse liedat 3 (ldquoneutralrdquo) in the multimedia domains and at 4(ldquoagreerdquo) in the travel domainIn summary the baseline approach achieved fairlygood performance results in TC1 The findings demon-strate that LOD-enabled recommendations can po-tentially enhance a userrsquos search experience Theyalso allow for subsequent tests of advanced retrievaltechniques which can be compared to this standardmethod

In the analysis of performance results from TC2(constraint-based recommendations) particular atten-tion was paid to participantsrsquo agreements with thestatement ldquoThe filter has improved the recommenda-tion results of the listrdquo Fig 8 shows the distribution ofdomain-wise agreement ratios On average they werefairly high Throughout the domains the vast majorityof participants either selected 5 (ldquostrongly agreerdquo) 4(ldquoagreerdquo) or 3 (ldquoneutralrdquo) Hence a filter often had apositive impact on results

The authors gathered responses to the questionon improvement for recommendation lists resultingfrom both the simple and the expressive filtering ap-proaches Hence they give clues about the generalperformance of constraint-based retrieval Howeversince this section of the study revealed the experimen-tal condition to participants (ie application of a fil-ter) answers to the Likert statement have to be inter-preted cautiously Results may be slightly biased asparticipants were able to guess the underlying agendaHowever these limitations do not exist for compar-isons between simple and expressive filters Since theweb application randomly assigned the list order par-ticipants did not know which approach they were as-sessing This enables us to take a closer look at the dif-ferences between the two filtering approaches It was

Table 12Performance results in TC1 (M = Mean S D = Standard DeviationN = No of participants)

Dimension (Metric)[Max Score]

Domain Results

M SD N

Accuracy (size) [10]

Travel 1000 000 103Movie 1000 000 50Music 1000 000 53Book 980 140 51

Accuracy (mrs) [1]

Travel 5649 000 103Movie 6225 1915 50Music 5570 010 53Book 5747 1555 50

Novelty (nv) [1]

Travel 060 034 101Movie 036 031 50Music 036 039 52Book 061 029 47

Diversity (divU) [5]

Travel 4 - 102Movie 3 - 50Music 3 - 53Book 3 - 50

Diversity (divC) [1]

Travel 080 016 103Movie 087 012 50Music 086 010 50Book 092 008 50

Usefulness [5]

Travel 334 085 103Movie 355 073 50Music 318 085 53Book 343 081 50

hypothesized that advanced filters improve recommen-dation quality as they may identify more relevant re-sources thereby overcoming problems of data qualityor data sparsity in LOD repositories Table 13 lists re-sponse rates of the two methods The rate measuresthe ratio of non-zero result sets among all result setsThroughout the domains recommendation lists result-ing from expressive filtering achieved higher responserates

Table 13Response Rates (TC2)

Domain Constr-based(Regular)

Constr-based(Expanded)

N

Travel 23 57 103

Movie 86 88 50

Music 72 75 53

Book 73 76 51

22 L Wenige et al Similarity-based Graph Queries

000

025

050

075

100

Digital Library Travel Movie Music Book

Domain

Per

cent

age

of A

gree

men

t

Degree of Agreement

1

2

3

4

5

The filter has improved the recommendations of the list

Fig 8 Participantsrsquo agreement with the statement that the filter has improved the result list (domain-wise) (TC2)

In addition to response rates user assessments andscores in the defined quality dimensions were com-pared Table 14 depicts the outcome of this analysisSignificantly higher scores are marked in bold figuresIn case statistical tests identified no meaningful varia-tions the results of the better performing approach areunderlined A-levels were adjusted with the false dis-covery rate (FDR) in order to decrease the probabilityof making a type I error due to multiple testing

According to mean scores expressive requestsgenerated more comprehensive recommendation lists(size) This finding is in line with the increased re-sponse rates shown in Table 13 However subse-quently conducted t-tests confirmed significant dif-ferences only for the travel experiment (t(102) =minus789 p lt 0001) In contrast precision scores werehigher in most of the domains (travel movie mu-sic) when the SKOSRec engine processed a simpleconstraint-based query However statistical tests didnot confirm any systematic differences The authorsapplied Wilcoxon tests for mrs scores and the remain-ing performance metrics because of the small samplesizes resulting from the low response rates of the sim-ple constraint-based approachRegarding Novelty none of the approaches was supe-rior In the Diversity dimension expressive filters gen-erated suggestions that were at least as topically di-versified as the results from simple constraint-basedretrieval Even though no significant differences were

identified between the two approaches increased meanscores (divU in the travel domain and divC in each do-main) indicate a slight superiority of expanded filterrequests in this quality dimension The same appliesto usefulness scores (mean values were higher in 3 outof 4 domains when expressive filtering was applied)However these statements are speculative at best be-cause statistical tests were not significantIn summary it can be concluded that expressive fil-ters improve recall potentially leading to a slight lossin precision scores (see Table 14) It may also be thecase that expanded user constraints diversify as wellas increase the usefulness of recommendation listsHowever these claims are unverified In contrast itis safe to say that expressive query patterns generateresults of at least the same quality as simple patternswhile increasing the number of relevant suggestionsThe few differences may be explained by the high Jac-card scores (JI) (see Table 15) of recommendation listsThroughout the domains result sets were much alikeGiven these similarities and the increases in list sizesfor expressive filters it is supposed that an expandedconstraint produces an enhanced version of the recom-mendation list resulting from simple filtering There-fore expanded filters should be the default setting in aconstraint-based retrieval context

In TC3 (ie evaluation of roll-up query patterns)the SKOSRec engine almost always generated non-empty recommendation lists (see Table 16) In the

L Wenige et al Similarity-based Graph Queries 23

Table 14Performance results in TC2 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Simple Expressive N

M SD M SD

Accuracy (size) [10]

Travel 117 270 452 462 103Movie 672 408 720 392 50Music 596 464 636 451 53Book 459 450 500 451 51

Accuracy (mrs) [1]

Travel 6673 2051 6534 2236 23Movie 6095 2298 6029 2279 43Music 6377 1605 6296 1644 34Book 5617 1818 5681 1681 37

Novelty (nv) [1]

Travel 048 041 061 049 21Movie 044 036 055 080 42Music 055 039 052 037 38Book 077 034 076 033 33

Diversity (divU) [5]

Travel 3 - 35 - 24Movie 4 - 4 - 43Music 4 - 4 - 38Book 3 - 3 - 37

Diversity (divC) [1]

Travel 068 018 072 021 19Movie 070 024 073 023 41Music 086 013 087 012 34Book 084 024 095 063 29

Usefulness [5]

Travel 333 093 350 077 24Movie 332 085 331 086 43Music 355 081 364 078 38Book 339 083 362 121 37

Table 15Jaccard Indices (TC2)

Domain Mean Jaccard Index (JI) SD N

Travel 057 047 25

Movie 071 038 43

Music 082 035 37

Book 092 026 35

travel and the book experiment the response rate washigher for aggregation-based queries whereas in themovie domain the engine more often provided recom-mendations for regular requests However these dif-ferences are only marginal and do not indicate a clearsuperiority of one method

Fig 9 depicts participants general agreement to Lik-ert statements concerning the Perceived Usefulness ofthe approaches in TC3 The diagram shows similarresults for regular as well as advanced requests Onaverage satisfaction scores reached levels which layslightly above neutral agreement A subsequent t-test

Table 16Response Rates (TC3)

Domain Regular Aggr-based N

Travel 99 100 103

Movie 96 92 50

Music 100 100 53

Book 96 100 51

confirmed no significant differences between the twomethods In fact the approach with the highest aver-age score of Perceived Usefulness was different in eachexperiment While mean values were higher for regu-lar requests in the studies on travel and music RS ad-vanced queries produced higher scores in the movieand book experiments

Content-based Diversity (divC) was the only met-ric with systematic differences Here scores went upfor advanced retrieval patterns in the travel (t(98) =minus550 p lt 0001) the music (t(52) = minus451 p lt

0001) and the book domain (t(47) = minus490 p lt

24 L Wenige et al Similarity-based Graph Queries

0

1

2

3

Travel Movie Music Book

Domain

Agr

eem

ent

Method

Regular

Rollminusup

Perceived Usefulness (Regular vs Rollminusup Queries)

Fig 9 Participantrsquos overall satisfaction with regular and roll-up recommendation retrieval (TC3)

0001) (see Table 17) This finding is not surprisinggiven that the aggregation-based approach explores thewider semantic network of each item in the profile

Hence it can be assumed that both approaches pro-duce recommendation lists of similar quality and canprovide benefits to potential consumers Thus it can-not be clearly stated which approach is superior be-cause neither of them outperformed the other in eachdomain While the regular retrieval strategy was bet-ter in terms of usefulness in the travel and the mu-sic domains the aggregation-based approach achievedhigher scores in the domains of movie and book rec-ommendationsThis conclusion is especially impressive given thelow mean Jaccard scores for recommendation lists re-sulting from regular and aggregation-based retrieval(see Table 18) The mean score never exceeded 01

throughout the domains It indicates a low concor-dance between result sets It is remarkable that par-ticipants perceived the recommendations as equallygood given the high dissimilarity of the lists Thusadvanced queries have an added-value as they provideadditional interesting results They can help to exploreLOD repositories in different ways than a regular re-trieval approach Therefore it is worthwhile to offeraggregation-based query patterns as an alternative re-trieval strategy when the regular approach has not pro-duced any helpful recommendations

In addition to TC3 user assessments for TC4 (cross-domain recommendations) were also evaluated How-ever since the authors conducted experiments for TC4only in the multimedia domains the samples were con-siderably smaller Nevertheless it is assumed that theamount of completed user sessions is only just enoughto conduct further statistical analyses Table 19 showsthe response rates for the two retrieval methods Cross-domain patterns had a higher success rate of generat-ing non-zero result sets throughout the domains Par-ticipants almost always received a recommendation forthis query type whereas in case a regular SPARQLquery was executed often they did not receive a singlesuggestion at all

We applied non-parametric tests for the perfor-mance metrics mrs nv divU divC and PerceivedUsefulness due to the low response rates of theSPARQL-based approach which led to a low num-ber of comparable data points accordingly Overall theFDR-adjusted pairwise Wilcoxon tests detected no sig-nificant differences except for user-based Diversityscores (divU) in the movie (Z = minus269 p lt 005) andthe book domain (Z = minus256 p lt 005) The resultsfor the content-based Diversity (divC) metric seem topoint in the same direction because of the high meanvalues for cross-domain queries in each domain Butthe significance of these results was not verified Themrs scores indicate a better performance of regular

L Wenige et al Similarity-based Graph Queries 25

Table 17Performance results in TC3 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular Aggr-based N

M SD M SD

Accuracy (size) [10]

Travel 990 099 942 207 103Movie 288 059 276 082 50Music 1000 000 1000 000 53Book 288 059 300 000 51

Accuracy (mrs) [1]

Travel 4821 2498 4493 2559 102Movie 5458 2288 5669 2542 44Music 5395 1831 5824 1619 52Book 5153 2390 4925 2230 49

Novelty (nv) [1]

Travel 043 039 038 040 98Movie 035 038 028 041 41Music 049 036 046 039 50Book 061 042 062 044 47

Diversity (divU) [5]

Travel 35 - 3 - 100Movie 4 - 3 - 43Music 4 - 3 - 50Book 3 - 3 - 48

Diversity (divC) [1]

Travel 079 017 089 013 99Movie 075 020 066 027 44Music 083 017 093 007 53Book 074 027 093 006 48

Usefulness [5]

Travel 336 077 322 074 101Movie 320 092 326 094 44Music 345 084 344 085 51Book 295 087 303 096 48

Table 18Jaccard Indices (TC3)

Domain Mean Jaccard Index (JI) SD N

Travel 007 011 101

Movie 003 007 44

Music 002 005 53

Book 002 006 50

Table 19Response rates (TC4)

Domain Regular (SPARQL) SKOSRec N

Movie 32 96 50

Music 62 98 53

Book 53 100 51

SPARQL queries but the within-subjects test did notconfirm this assumption statistically (see Table 20)

However the most important finding of the anal-ysis was that SKOSRec cross-domain queries gener-ated significantly more suggestions throughout the do-

mains (movie t(49) = minus1792 p lt 0001 mu-sic t(52) = minus2573 p lt 0001 book t(50) =minus1275 p lt 0001) (see Table 20) Fig 10 illustratesthis graphically

Whenever the SPARQL request provided recom-mendations which was the case in less than half ofthe user sessions (see Table 19) the SKOSRec sug-gestions were assessed to be of almost equal qualityas the SPARQL recommendations (see Table 20 Use-fulness) Hence when SPARQL querying did not pro-duce any results in more than the other half of the testcases the SKOSRec approach may still have provideduseful suggestions

Therefore it is reasonable to assume that the capa-bility of the SKOSRec engine to flexibly switch be-tween processing steps of similarity calculation andpattern matching can facilitate an improved explo-ration of LOD repositories Table 21 also shows thatthe items in the two sets often did not match as isindicated by low Jaccard indices This finding furtherdemonstrates that the execution of an additional step

26 L Wenige et al Similarity-based Graph Queries

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

00

25

50

75

100

Movie Music Book

Domain

No

of i

tem

s

Method

Regular

CrossminusDomain

Recommendation List Size

Fig 10 Mean recommendation list sizes for SPARQL and SKOSRec cross-domain requests (TC4)

L Wenige et al Similarity-based Graph Queries 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

of similarity calculation can help to retrieve other rel-evant items

5 Future Work amp Conclusion

This paper has presented the query-based SKOS-Rec engine that facilitates new types of recommen-dation requests for LOD repositories The evaluationshave shown that the SKOSRec system can often gener-ate relevant and useful suggestions when certain querytemplates are utilized In many cases the applicationof expressive or advanced query patterns (ie in thecontext of constraint-based retrieval or cross-domainrequests) helped to significantly improve recall valueswhile still providing the same level of quality in theremaining performance dimensions Additionally ad-vanced recommendation requests can be used to gener-ate diversified recommendation lists in case users arenot satisfied with the results of regular recommenda-tion requests (ie roll-up or cross-domain retrieval vsregular requests) While not being superior in each do-main and test case is at least safe to say that the novelrecommendation approaches of the SKOSRec engineconsiderably extend common retrieval methods and atleast provide an alternative search strategy when con-ventional methods fail to provide useful results Addi-tionally the increase in diversity and recall is in linewith findings from previous research on LOD-enabledRS [45]It is a strength of the developed approaches that theyfacilitate combinations of graph-based and similarity-based retrieval at different stages of the recommenda-tion workflow This feature extends general search ca-pabilities for semantic networks It requires future re-search to investigate whether in addition to SKOS fur-ther RDF vocabularies can be used for ad-hoc item-to-item similarity computation to increase the range ofpotential usage scenarios since not every LOD datasetcontains SKOS annotations [53] The application ofadditional properties could also boost the recommen-dation quality of some query patterns (eg such as thequery types used in TC3) In this context it also needsto be determined whether the representation of the user

profile in the SKOSRec query language can be substi-tuted with free-text expressions to enable search-likefunctionalities LOD researchers have already devel-oped engines that can process natural language queriesover RDF data [59] [34] [67] It will have to be inves-tigated how the SKOSRec engine profitably fits intothis landscape of existing retrieval toolsAnother open research question concerns the aspect ofinterfaces to other applications In its standard config-uration the system is not an end-user retrieval enginebut a backend application with which an administra-tor who has domain knowledge (eg of RDF vocab-ularies and data models of a particular LOD reposi-tory and usage scenario) creates the respective querypatterns Although the websites of the online experi-ments represent possible implementations of suitableend-user applications for the SKOSRec engine thereare still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selec-tion of LOD items for the profile was made possibleby previously storing the data in an index that couldthen be accessed and searched before issuing a recom-mendation query However this runs against the ideaof ad-hoc retrieval In this context future research willhave to determine whether it is feasible to generate anon-the-fly mapping from items to the respective LODresources In the case of a commercial application theengine needs to match items with the correspondingentries in the product catalog Even if it is not yet clearwhich UI components can make the best possible useof the SKOSRec enginersquos features the results from theuser experiments can give at least a few suggestionsThe following retrieval options should be the defaultsetting in an interface that contains a LOD-enabledRS component to assist users in finding interestingitems Since the regular approach of on-the-fly re-trieval achieved good results throughout the domainsthe display of simple recommendations generated frompreviously stated user preferences represents a suitablestarting point for retrieval On-the-fly queries could berefined by providing filter options Since the evalua-tions have shown that expressive constraints often leadto increased recall values the application of such afilter might be the best option for assisted retrievalAs in the preparations for the experiments appro-priate graph-based query patterns for the usage sce-nario in question would also have to be determinedby a domain expert before setting up the applicationThe same applies to advanced retrieval patterns whichshould be available to the end user to refine the querywhen both the simple as well as the constraint-based

28 L Wenige et al Similarity-based Graph Queries

approach have failed to provide relevant items Thissuggestion is made because roll-up queries were com-petitive with regular recommendations in the web-based experiments Therefore they represent a viablealternative retrieval strategy For some domains (egfor multimedia retrieval) it is also possible to usean optional selection field to enable the formulationof cross-domain requests Evaluations for this querytype suggest that users might receive interesting results(ie diversified and comprehensive recommendationlists) from such a requestIn summary the contributions of the paper are as fol-lows

ndash Development and implementation of novel re-trieval strategies in the SKOSRec system whichfacilitate personalized requests and leverage thepotential of knowledge graphs in order to addressthe research gap in state-of-the-art recommenda-tion methods with regard to customized queries

ndash Realization of web-based user experiments infour usage scenarios with the following out-comes

bull Successful demonstration that knowledge graphdata is applicable in the context of live recom-mendation scenarios

bull Creation of suitable query patterns that canbe applied to represent individual informationneeds in different application contexts The setof query patterns can be extended with the helpof the SKOSRec query language to meet therequirements of additional usage scenarios

bull Proof that the novel retrieval strategies can besuccessfully applied across various applicationcontexts and recommendation tasks and thatthey outperform conventional content-basedretrieval approaches especially in the perfor-mance dimensions of diversity and recall

The prototypical implementation and the evaluationof the SKOSRec engine in different usage scenarioshave demonstrated that the freely available knowledgesources on the LOD cloud can be successfully appliedto advance state-of-the-art methods in IR and recom-mender systems The proposed personalized and cus-tomized retrieval strategies can leverage the potentialof knowledge graphs by better matching informationneeds of users with the help of semantic search op-tions

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be com-prised of up to six elements of which the partsSimProjection and ItemPart are obligatory Henceusers need to state at least their preferencesand how many recommendations they wouldlike to receive Advanced retrieval patterns canbe formulated as well For instance similar re-source retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (Se-lectPart) of LOD resources As in SPARQL thequery can start with a prologue that abbreviatesIRIs with namespace declarations [21]

SelectPart The SelectPart part enables subqueryingwith recommendation results The surroundingquery is a SELECT query with a slightly simpli-fied WHERE clause (ie RecWhereClause) withwhich postfilter conditions can be formulatedIn this section users have the option to formu-late aggregation-based retrieval requests (Aggre-gation)

Aggregation This query part handles aggregation-based postfiltering Users specify the IRI (IRIref )resource for which similarity scores of sublevelentities are summarized Additionally it is statedhow the data should be aggregated (ie SUMMAX or AVG) as well as which variable (Var)represents the aggregation-based recommenda-tions in the postfilter section The compiler makessure that this variable is actually contained in theRecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refersto the list of generated suggestions It definesthe recommendation variable to which all simi-lar LOD resources are mapped In case pre- andpostfilter conditions are set in the other sectionsthe compiler checks whether the variable (Var)appears in these sections as well Otherwise therequired join operations cannot be executed Itis specified how many recommendations shouldbe displayed (Integer) and whether the requestwill be issued against a default repository or as across-repository request (ServiceIntegration)

ServiceIntegration This section indicates that a userintends to receive recommendations from a dif-ferent SPARQL endpoint than the default end-point that is stated in the standard configurationof the SKOSRec engine It specifies the targetSPARQL endpoint (IRIref ) from which a user

L Wenige et al Similarity-based Graph Queries 29

wants to receive recommendations Upon extrac-tion of SKOS annotations from the default sourceendpoint a subsequent request is sent to the spec-ified target endpoint

ItemPart This section of a SKOSRec query repre-sents a single preference in the user profile How-ever recommendation queries can contain a cou-ple of preference statements A preference is ei-ther expressed as a LOD resource (IRIref ) or asa variable which is bound to a preference query(VarPart) In the latter case the resources areobtained by matching the graph pattern in theVarPart with the triple statements in the reposi-tory The ItemPart statement can be additionallysupplemented with a concept-to-concept similar-ity threshold (Sim) that triggers concept expan-sion with the approach of flexible similarity de-tection which is extensively described in one ofour previous works [64]

VarPart The construct initiates preference queryingIndividual likings are expressed as a graph pat-tern in a RecWhereClause The variable (Var) isa placeholder for the LOD resources that will beused to set up the user profile The compiler hasto make sure that the specified variable occurs inthe RecWhereClause as well Otherwise the ex-traction of LOD resources cannot be carried outcorrectly

Sim The Sim part of a SKOSRec query denotes thethreshold of concept-to-concept scores for flexi-ble similarity detection Even though knowledge-based similarity values usually range between 0and 1 the specification allows a broader range ofdigits in the Decimal datatype to enable applica-tion of other similarity metrics in case they areneeded

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users canstate whether scores should be ldquolargerrdquo ldquolargerthanrdquo or ldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three dif-ferent sections of a SKOSRec query ie either inthe prefilter (ItemPart) the postfilter (SelectPart)or the preference filter section (VarPart) TheRecWhereClause closely resembles the ldquoWhere-Clauserdquo of the SPARQL 11 specification withall its subsequent parts [21] Hence it enablesdifferent combinations of graph pattern match-ing However the RecWhereClause of the SKOS-Rec language allows fewer pattern matching ex-pressions than the ldquoWhereClauserdquo of the regular

SPARQL syntax specification [21] For instanceit is neither permitted to formulate subqueries norto direct filter requests to more than one reposi-tory (see RecGraphPatternNotTriples for furtherexplanations) This modification has been madeto simplify similarity calculation After parsingthis section the compiler makes sure that recom-mendation preference or postfilter variables oc-cur at least once in the RecWhereClause to guar-antee that subsequent steps of the recommenda-tion workflow can be applied on existing LOD re-sources In case the RecWhereClause comprises aRecMinusGraphPattern it also has to be checkedthat the variable is not only contained in theMINUS part of the group

RecGroupGraphPattern A RecGroupGraphPatterncan contain one to many basic graph patternswhich can be differently combined according toRecGroupGraphPatternSub

RecGroupGraphPatternSub This section defines thegraph patterns that are applied to identify suit-able LOD resources User filters can be either ex-pressed as basic graph patterns (TriplesBlock) oras combinations of them (RecGraphPatternNot-Triples) The sequence of different kinds of pat-terns can be flexibly specified

RecGraphPatternNotTriples This query part closelyresembles the similar named ldquoGraphPatternNot-Triplesrdquo of the SPARQL 11 specification [21]The only difference to SPARQL is that the Rec-GraphPatternNotTriples section is a little morerestricted in terms of admissible graph patternsthan the ldquoGraphPatternNotTriplesrdquo section of theregular SPARQL specification For instance itdoes not allow to retrieve LOD resources otherthan from the default graph that is specified in theconfiguration of the SKOSRec engine since filterpatterns have to be applied to datasets that containSKOS annotations Hence they need to be speci-fied before runtime execution to ensure that sim-ilarity calculation can be carried out correctly Ifa user was able to formulate filter conditions thatreferred to different RDF graphs and endpointsaccordingly this would not be possible

RecGroupOrUnionGraphPattern This query sec-tion marks an alternative graph pattern in the styleof the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntaxspecification [21] However it neither allowsSPARQL-like subqueries nor federated queries

30 L Wenige et al Similarity-based Graph Queries

for filtered resources because this would compli-cate similarity calculation

RecOptionalGraphPattern In this part of a SKOS-RecQuery users are enabled to specify additionalgraph patterns which might extend the query so-lution but do not necessarily have to match thedata

RecMinusGraphPattern This section handles exclu-sion of LOD resources for cases when users wantto omit certain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

L Wenige et al Similarity-based Graph Queries 31

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM A

mobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige J Ruhland Flexible on-the-fly recommendationsfrom Linked Open Data repositories In International Confer-ence on Business Information Systems pp 43ndash54 2016

[63] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[64] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[65] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[66] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[67] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Future Work amp Conclusion
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 20: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge … · Semantic Web 0 (0) 1 1 IOS Press Similarity-based Knowledge Graph Queries for Recommendation Retrieval Lisa Wenige*,

20 L Wenige et al Similarity-based Graph Queries

Fig 7 Music - User profile generation TC4 (page 9)

Table 10Test Cases in the web experiments

Test Case Evaluation Purpose Query Type Travel Movie Music BookTC1 On-the-fly Recommendations (Baseline) Q1 X X X X

TC2 Constraint-based Recommendations Q3 X X X X

TC3 Advanced Queries (Rollup) Q5 Q6 X X X X

TC4 Advanced Queries (Cross-Domain) Q7 x X X X

from both genders with a slight overrepresentation ofmale participants (537) The prevailing number ofstudy subjects was between 20 and 40 years of age(658) The evaluation of the demographics sectionalso revealed that the majority of participants alreadyperceives the search in their domain of interest as ei-ther ldquomanageablerdquo (385) or even as ldquoeasyrdquo or ldquoveryeasyrdquo (440)Prior to analyzing the performance results of the novelretrieval approaches of the SKOSRec engine it waschecked whether the respective Likert items for Per-ceived Usefulness and user-based diversity (divU)reached acceptable reliability levels For this purposeCronbachrsquos α values were determined Table 11 showsthe outcome of this analysis for the Perceived Useful-ness construct Given the acceptable values through-out the domains (Cronbachsrsquo s α gt 07) it could betreated as an interval-scaled variable in the statisticaltests

Table 11Cronbachrsquos α values for the concept of Perceived Usefulness

Domain Cronbach αTravel 07002

Movie 08579

Music 08515

Book 08949

Besides usefulness user-based diversity scores werealso measured by multiple Likert items However

Cronbachrsquos α values for this concept did not reach ac-ceptable levels In order to avoid that valuable user as-sessments remained unused agreements to the state-ment bdquoThe recommendations of list [] are diverse ldquowere taken into account as an ordinal variable (divU)Based on these results subsequent tests could be car-ried out The first part of the analysis concerned theperformance of the on-the-fly retrieval method (base-line) of the SKOSRec engine in TC1 The outcomeof this analysis confirms that the on-the-fly approachrepresents a viable recommendation strategy Table 12shows the mean (M)12 standard deviation (S D) andnumber of participants (N) for each performance di-mension and metric It can be seen that users assessedthe enginersquos performance in almost each quality di-mension to be above mediocrity Aside from the posi-tive user evaluations with regard to Perceived Uselful-ness related quality aspects such as Accuracy Noveltyand Diversity received good assessments as well Forinstance each domain mean relevance score (mrs) washigher than 50 score points It means that participantswere often convinced of the relevance of the resultsGiven these positive user assessments with regard toitem relevance it seems to be justified to consider themean of result list sizes as an indicator of recall Theengine was able to generate the required number ofrecommendations in almost each test case (see Table

12In case of the user-based diversity score (divU) M denotes themedian because of the ordinal scale of the variable

L Wenige et al Similarity-based Graph Queries 21

12) The on-the-fly retrieval approach was also testedwith regard to the Novelty dimension Table 12 showsthe mean novelty scores (nv) for the baseline recom-mendation approach Novelty scores indicate that themajority of items in a result list were deemed to be newin the travel and book RS whereas in the experimentson movie and music RS only one third of the recom-mendation list was unfamiliar to users In contrast tothat content-based Diversity scores (divC) were ratherhigh in each domain Hence it can be concluded thatitems in a result set usually have a low rate of match-ing SKOS annotations In addition to content-basedscores user assessments of recommendation list Di-versity (divU) were also taken into account The me-dian level of agreement to the Diversity-related Lik-ert item stating that recommendations are diverse liedat 3 (ldquoneutralrdquo) in the multimedia domains and at 4(ldquoagreerdquo) in the travel domainIn summary the baseline approach achieved fairlygood performance results in TC1 The findings demon-strate that LOD-enabled recommendations can po-tentially enhance a userrsquos search experience Theyalso allow for subsequent tests of advanced retrievaltechniques which can be compared to this standardmethod

In the analysis of performance results from TC2(constraint-based recommendations) particular atten-tion was paid to participantsrsquo agreements with thestatement ldquoThe filter has improved the recommenda-tion results of the listrdquo Fig 8 shows the distribution ofdomain-wise agreement ratios On average they werefairly high Throughout the domains the vast majorityof participants either selected 5 (ldquostrongly agreerdquo) 4(ldquoagreerdquo) or 3 (ldquoneutralrdquo) Hence a filter often had apositive impact on results

The authors gathered responses to the questionon improvement for recommendation lists resultingfrom both the simple and the expressive filtering ap-proaches Hence they give clues about the generalperformance of constraint-based retrieval Howeversince this section of the study revealed the experimen-tal condition to participants (ie application of a fil-ter) answers to the Likert statement have to be inter-preted cautiously Results may be slightly biased asparticipants were able to guess the underlying agendaHowever these limitations do not exist for compar-isons between simple and expressive filters Since theweb application randomly assigned the list order par-ticipants did not know which approach they were as-sessing This enables us to take a closer look at the dif-ferences between the two filtering approaches It was

Table 12Performance results in TC1 (M = Mean S D = Standard DeviationN = No of participants)

Dimension (Metric)[Max Score]

Domain Results

M SD N

Accuracy (size) [10]

Travel 1000 000 103Movie 1000 000 50Music 1000 000 53Book 980 140 51

Accuracy (mrs) [1]

Travel 5649 000 103Movie 6225 1915 50Music 5570 010 53Book 5747 1555 50

Novelty (nv) [1]

Travel 060 034 101Movie 036 031 50Music 036 039 52Book 061 029 47

Diversity (divU) [5]

Travel 4 - 102Movie 3 - 50Music 3 - 53Book 3 - 50

Diversity (divC) [1]

Travel 080 016 103Movie 087 012 50Music 086 010 50Book 092 008 50

Usefulness [5]

Travel 334 085 103Movie 355 073 50Music 318 085 53Book 343 081 50

hypothesized that advanced filters improve recommen-dation quality as they may identify more relevant re-sources thereby overcoming problems of data qualityor data sparsity in LOD repositories Table 13 lists re-sponse rates of the two methods The rate measuresthe ratio of non-zero result sets among all result setsThroughout the domains recommendation lists result-ing from expressive filtering achieved higher responserates

Table 13Response Rates (TC2)

Domain Constr-based(Regular)

Constr-based(Expanded)

N

Travel 23 57 103

Movie 86 88 50

Music 72 75 53

Book 73 76 51

22 L Wenige et al Similarity-based Graph Queries

000

025

050

075

100

Digital Library Travel Movie Music Book

Domain

Per

cent

age

of A

gree

men

t

Degree of Agreement

1

2

3

4

5

The filter has improved the recommendations of the list

Fig 8 Participantsrsquo agreement with the statement that the filter has improved the result list (domain-wise) (TC2)

In addition to response rates user assessments andscores in the defined quality dimensions were com-pared Table 14 depicts the outcome of this analysisSignificantly higher scores are marked in bold figuresIn case statistical tests identified no meaningful varia-tions the results of the better performing approach areunderlined A-levels were adjusted with the false dis-covery rate (FDR) in order to decrease the probabilityof making a type I error due to multiple testing

According to mean scores expressive requestsgenerated more comprehensive recommendation lists(size) This finding is in line with the increased re-sponse rates shown in Table 13 However subse-quently conducted t-tests confirmed significant dif-ferences only for the travel experiment (t(102) =minus789 p lt 0001) In contrast precision scores werehigher in most of the domains (travel movie mu-sic) when the SKOSRec engine processed a simpleconstraint-based query However statistical tests didnot confirm any systematic differences The authorsapplied Wilcoxon tests for mrs scores and the remain-ing performance metrics because of the small samplesizes resulting from the low response rates of the sim-ple constraint-based approachRegarding Novelty none of the approaches was supe-rior In the Diversity dimension expressive filters gen-erated suggestions that were at least as topically di-versified as the results from simple constraint-basedretrieval Even though no significant differences were

identified between the two approaches increased meanscores (divU in the travel domain and divC in each do-main) indicate a slight superiority of expanded filterrequests in this quality dimension The same appliesto usefulness scores (mean values were higher in 3 outof 4 domains when expressive filtering was applied)However these statements are speculative at best be-cause statistical tests were not significantIn summary it can be concluded that expressive fil-ters improve recall potentially leading to a slight lossin precision scores (see Table 14) It may also be thecase that expanded user constraints diversify as wellas increase the usefulness of recommendation listsHowever these claims are unverified In contrast itis safe to say that expressive query patterns generateresults of at least the same quality as simple patternswhile increasing the number of relevant suggestionsThe few differences may be explained by the high Jac-card scores (JI) (see Table 15) of recommendation listsThroughout the domains result sets were much alikeGiven these similarities and the increases in list sizesfor expressive filters it is supposed that an expandedconstraint produces an enhanced version of the recom-mendation list resulting from simple filtering There-fore expanded filters should be the default setting in aconstraint-based retrieval context

In TC3 (ie evaluation of roll-up query patterns)the SKOSRec engine almost always generated non-empty recommendation lists (see Table 16) In the

L Wenige et al Similarity-based Graph Queries 23

Table 14Performance results in TC2 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Simple Expressive N

M SD M SD

Accuracy (size) [10]

Travel 117 270 452 462 103Movie 672 408 720 392 50Music 596 464 636 451 53Book 459 450 500 451 51

Accuracy (mrs) [1]

Travel 6673 2051 6534 2236 23Movie 6095 2298 6029 2279 43Music 6377 1605 6296 1644 34Book 5617 1818 5681 1681 37

Novelty (nv) [1]

Travel 048 041 061 049 21Movie 044 036 055 080 42Music 055 039 052 037 38Book 077 034 076 033 33

Diversity (divU) [5]

Travel 3 - 35 - 24Movie 4 - 4 - 43Music 4 - 4 - 38Book 3 - 3 - 37

Diversity (divC) [1]

Travel 068 018 072 021 19Movie 070 024 073 023 41Music 086 013 087 012 34Book 084 024 095 063 29

Usefulness [5]

Travel 333 093 350 077 24Movie 332 085 331 086 43Music 355 081 364 078 38Book 339 083 362 121 37

Table 15Jaccard Indices (TC2)

Domain Mean Jaccard Index (JI) SD N

Travel 057 047 25

Movie 071 038 43

Music 082 035 37

Book 092 026 35

travel and the book experiment the response rate washigher for aggregation-based queries whereas in themovie domain the engine more often provided recom-mendations for regular requests However these dif-ferences are only marginal and do not indicate a clearsuperiority of one method

Fig 9 depicts participants general agreement to Lik-ert statements concerning the Perceived Usefulness ofthe approaches in TC3 The diagram shows similarresults for regular as well as advanced requests Onaverage satisfaction scores reached levels which layslightly above neutral agreement A subsequent t-test

Table 16Response Rates (TC3)

Domain Regular Aggr-based N

Travel 99 100 103

Movie 96 92 50

Music 100 100 53

Book 96 100 51

confirmed no significant differences between the twomethods In fact the approach with the highest aver-age score of Perceived Usefulness was different in eachexperiment While mean values were higher for regu-lar requests in the studies on travel and music RS ad-vanced queries produced higher scores in the movieand book experiments

Content-based Diversity (divC) was the only met-ric with systematic differences Here scores went upfor advanced retrieval patterns in the travel (t(98) =minus550 p lt 0001) the music (t(52) = minus451 p lt

0001) and the book domain (t(47) = minus490 p lt

24 L Wenige et al Similarity-based Graph Queries

0

1

2

3

Travel Movie Music Book

Domain

Agr

eem

ent

Method

Regular

Rollminusup

Perceived Usefulness (Regular vs Rollminusup Queries)

Fig 9 Participantrsquos overall satisfaction with regular and roll-up recommendation retrieval (TC3)

0001) (see Table 17) This finding is not surprisinggiven that the aggregation-based approach explores thewider semantic network of each item in the profile

Hence it can be assumed that both approaches pro-duce recommendation lists of similar quality and canprovide benefits to potential consumers Thus it can-not be clearly stated which approach is superior be-cause neither of them outperformed the other in eachdomain While the regular retrieval strategy was bet-ter in terms of usefulness in the travel and the mu-sic domains the aggregation-based approach achievedhigher scores in the domains of movie and book rec-ommendationsThis conclusion is especially impressive given thelow mean Jaccard scores for recommendation lists re-sulting from regular and aggregation-based retrieval(see Table 18) The mean score never exceeded 01

throughout the domains It indicates a low concor-dance between result sets It is remarkable that par-ticipants perceived the recommendations as equallygood given the high dissimilarity of the lists Thusadvanced queries have an added-value as they provideadditional interesting results They can help to exploreLOD repositories in different ways than a regular re-trieval approach Therefore it is worthwhile to offeraggregation-based query patterns as an alternative re-trieval strategy when the regular approach has not pro-duced any helpful recommendations

In addition to TC3 user assessments for TC4 (cross-domain recommendations) were also evaluated How-ever since the authors conducted experiments for TC4only in the multimedia domains the samples were con-siderably smaller Nevertheless it is assumed that theamount of completed user sessions is only just enoughto conduct further statistical analyses Table 19 showsthe response rates for the two retrieval methods Cross-domain patterns had a higher success rate of generat-ing non-zero result sets throughout the domains Par-ticipants almost always received a recommendation forthis query type whereas in case a regular SPARQLquery was executed often they did not receive a singlesuggestion at all

We applied non-parametric tests for the perfor-mance metrics mrs nv divU divC and PerceivedUsefulness due to the low response rates of theSPARQL-based approach which led to a low num-ber of comparable data points accordingly Overall theFDR-adjusted pairwise Wilcoxon tests detected no sig-nificant differences except for user-based Diversityscores (divU) in the movie (Z = minus269 p lt 005) andthe book domain (Z = minus256 p lt 005) The resultsfor the content-based Diversity (divC) metric seem topoint in the same direction because of the high meanvalues for cross-domain queries in each domain Butthe significance of these results was not verified Themrs scores indicate a better performance of regular

L Wenige et al Similarity-based Graph Queries 25

Table 17Performance results in TC3 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular Aggr-based N

M SD M SD

Accuracy (size) [10]

Travel 990 099 942 207 103Movie 288 059 276 082 50Music 1000 000 1000 000 53Book 288 059 300 000 51

Accuracy (mrs) [1]

Travel 4821 2498 4493 2559 102Movie 5458 2288 5669 2542 44Music 5395 1831 5824 1619 52Book 5153 2390 4925 2230 49

Novelty (nv) [1]

Travel 043 039 038 040 98Movie 035 038 028 041 41Music 049 036 046 039 50Book 061 042 062 044 47

Diversity (divU) [5]

Travel 35 - 3 - 100Movie 4 - 3 - 43Music 4 - 3 - 50Book 3 - 3 - 48

Diversity (divC) [1]

Travel 079 017 089 013 99Movie 075 020 066 027 44Music 083 017 093 007 53Book 074 027 093 006 48

Usefulness [5]

Travel 336 077 322 074 101Movie 320 092 326 094 44Music 345 084 344 085 51Book 295 087 303 096 48

Table 18Jaccard Indices (TC3)

Domain Mean Jaccard Index (JI) SD N

Travel 007 011 101

Movie 003 007 44

Music 002 005 53

Book 002 006 50

Table 19Response rates (TC4)

Domain Regular (SPARQL) SKOSRec N

Movie 32 96 50

Music 62 98 53

Book 53 100 51

SPARQL queries but the within-subjects test did notconfirm this assumption statistically (see Table 20)

However the most important finding of the anal-ysis was that SKOSRec cross-domain queries gener-ated significantly more suggestions throughout the do-

mains (movie t(49) = minus1792 p lt 0001 mu-sic t(52) = minus2573 p lt 0001 book t(50) =minus1275 p lt 0001) (see Table 20) Fig 10 illustratesthis graphically

Whenever the SPARQL request provided recom-mendations which was the case in less than half ofthe user sessions (see Table 19) the SKOSRec sug-gestions were assessed to be of almost equal qualityas the SPARQL recommendations (see Table 20 Use-fulness) Hence when SPARQL querying did not pro-duce any results in more than the other half of the testcases the SKOSRec approach may still have provideduseful suggestions

Therefore it is reasonable to assume that the capa-bility of the SKOSRec engine to flexibly switch be-tween processing steps of similarity calculation andpattern matching can facilitate an improved explo-ration of LOD repositories Table 21 also shows thatthe items in the two sets often did not match as isindicated by low Jaccard indices This finding furtherdemonstrates that the execution of an additional step

26 L Wenige et al Similarity-based Graph Queries

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

00

25

50

75

100

Movie Music Book

Domain

No

of i

tem

s

Method

Regular

CrossminusDomain

Recommendation List Size

Fig 10 Mean recommendation list sizes for SPARQL and SKOSRec cross-domain requests (TC4)

L Wenige et al Similarity-based Graph Queries 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

of similarity calculation can help to retrieve other rel-evant items

5 Future Work amp Conclusion

This paper has presented the query-based SKOS-Rec engine that facilitates new types of recommen-dation requests for LOD repositories The evaluationshave shown that the SKOSRec system can often gener-ate relevant and useful suggestions when certain querytemplates are utilized In many cases the applicationof expressive or advanced query patterns (ie in thecontext of constraint-based retrieval or cross-domainrequests) helped to significantly improve recall valueswhile still providing the same level of quality in theremaining performance dimensions Additionally ad-vanced recommendation requests can be used to gener-ate diversified recommendation lists in case users arenot satisfied with the results of regular recommenda-tion requests (ie roll-up or cross-domain retrieval vsregular requests) While not being superior in each do-main and test case is at least safe to say that the novelrecommendation approaches of the SKOSRec engineconsiderably extend common retrieval methods and atleast provide an alternative search strategy when con-ventional methods fail to provide useful results Addi-tionally the increase in diversity and recall is in linewith findings from previous research on LOD-enabledRS [45]It is a strength of the developed approaches that theyfacilitate combinations of graph-based and similarity-based retrieval at different stages of the recommenda-tion workflow This feature extends general search ca-pabilities for semantic networks It requires future re-search to investigate whether in addition to SKOS fur-ther RDF vocabularies can be used for ad-hoc item-to-item similarity computation to increase the range ofpotential usage scenarios since not every LOD datasetcontains SKOS annotations [53] The application ofadditional properties could also boost the recommen-dation quality of some query patterns (eg such as thequery types used in TC3) In this context it also needsto be determined whether the representation of the user

profile in the SKOSRec query language can be substi-tuted with free-text expressions to enable search-likefunctionalities LOD researchers have already devel-oped engines that can process natural language queriesover RDF data [59] [34] [67] It will have to be inves-tigated how the SKOSRec engine profitably fits intothis landscape of existing retrieval toolsAnother open research question concerns the aspect ofinterfaces to other applications In its standard config-uration the system is not an end-user retrieval enginebut a backend application with which an administra-tor who has domain knowledge (eg of RDF vocab-ularies and data models of a particular LOD reposi-tory and usage scenario) creates the respective querypatterns Although the websites of the online experi-ments represent possible implementations of suitableend-user applications for the SKOSRec engine thereare still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selec-tion of LOD items for the profile was made possibleby previously storing the data in an index that couldthen be accessed and searched before issuing a recom-mendation query However this runs against the ideaof ad-hoc retrieval In this context future research willhave to determine whether it is feasible to generate anon-the-fly mapping from items to the respective LODresources In the case of a commercial application theengine needs to match items with the correspondingentries in the product catalog Even if it is not yet clearwhich UI components can make the best possible useof the SKOSRec enginersquos features the results from theuser experiments can give at least a few suggestionsThe following retrieval options should be the defaultsetting in an interface that contains a LOD-enabledRS component to assist users in finding interestingitems Since the regular approach of on-the-fly re-trieval achieved good results throughout the domainsthe display of simple recommendations generated frompreviously stated user preferences represents a suitablestarting point for retrieval On-the-fly queries could berefined by providing filter options Since the evalua-tions have shown that expressive constraints often leadto increased recall values the application of such afilter might be the best option for assisted retrievalAs in the preparations for the experiments appro-priate graph-based query patterns for the usage sce-nario in question would also have to be determinedby a domain expert before setting up the applicationThe same applies to advanced retrieval patterns whichshould be available to the end user to refine the querywhen both the simple as well as the constraint-based

28 L Wenige et al Similarity-based Graph Queries

approach have failed to provide relevant items Thissuggestion is made because roll-up queries were com-petitive with regular recommendations in the web-based experiments Therefore they represent a viablealternative retrieval strategy For some domains (egfor multimedia retrieval) it is also possible to usean optional selection field to enable the formulationof cross-domain requests Evaluations for this querytype suggest that users might receive interesting results(ie diversified and comprehensive recommendationlists) from such a requestIn summary the contributions of the paper are as fol-lows

ndash Development and implementation of novel re-trieval strategies in the SKOSRec system whichfacilitate personalized requests and leverage thepotential of knowledge graphs in order to addressthe research gap in state-of-the-art recommenda-tion methods with regard to customized queries

ndash Realization of web-based user experiments infour usage scenarios with the following out-comes

bull Successful demonstration that knowledge graphdata is applicable in the context of live recom-mendation scenarios

bull Creation of suitable query patterns that canbe applied to represent individual informationneeds in different application contexts The setof query patterns can be extended with the helpof the SKOSRec query language to meet therequirements of additional usage scenarios

bull Proof that the novel retrieval strategies can besuccessfully applied across various applicationcontexts and recommendation tasks and thatthey outperform conventional content-basedretrieval approaches especially in the perfor-mance dimensions of diversity and recall

The prototypical implementation and the evaluationof the SKOSRec engine in different usage scenarioshave demonstrated that the freely available knowledgesources on the LOD cloud can be successfully appliedto advance state-of-the-art methods in IR and recom-mender systems The proposed personalized and cus-tomized retrieval strategies can leverage the potentialof knowledge graphs by better matching informationneeds of users with the help of semantic search op-tions

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be com-prised of up to six elements of which the partsSimProjection and ItemPart are obligatory Henceusers need to state at least their preferencesand how many recommendations they wouldlike to receive Advanced retrieval patterns canbe formulated as well For instance similar re-source retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (Se-lectPart) of LOD resources As in SPARQL thequery can start with a prologue that abbreviatesIRIs with namespace declarations [21]

SelectPart The SelectPart part enables subqueryingwith recommendation results The surroundingquery is a SELECT query with a slightly simpli-fied WHERE clause (ie RecWhereClause) withwhich postfilter conditions can be formulatedIn this section users have the option to formu-late aggregation-based retrieval requests (Aggre-gation)

Aggregation This query part handles aggregation-based postfiltering Users specify the IRI (IRIref )resource for which similarity scores of sublevelentities are summarized Additionally it is statedhow the data should be aggregated (ie SUMMAX or AVG) as well as which variable (Var)represents the aggregation-based recommenda-tions in the postfilter section The compiler makessure that this variable is actually contained in theRecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refersto the list of generated suggestions It definesthe recommendation variable to which all simi-lar LOD resources are mapped In case pre- andpostfilter conditions are set in the other sectionsthe compiler checks whether the variable (Var)appears in these sections as well Otherwise therequired join operations cannot be executed Itis specified how many recommendations shouldbe displayed (Integer) and whether the requestwill be issued against a default repository or as across-repository request (ServiceIntegration)

ServiceIntegration This section indicates that a userintends to receive recommendations from a dif-ferent SPARQL endpoint than the default end-point that is stated in the standard configurationof the SKOSRec engine It specifies the targetSPARQL endpoint (IRIref ) from which a user

L Wenige et al Similarity-based Graph Queries 29

wants to receive recommendations Upon extrac-tion of SKOS annotations from the default sourceendpoint a subsequent request is sent to the spec-ified target endpoint

ItemPart This section of a SKOSRec query repre-sents a single preference in the user profile How-ever recommendation queries can contain a cou-ple of preference statements A preference is ei-ther expressed as a LOD resource (IRIref ) or asa variable which is bound to a preference query(VarPart) In the latter case the resources areobtained by matching the graph pattern in theVarPart with the triple statements in the reposi-tory The ItemPart statement can be additionallysupplemented with a concept-to-concept similar-ity threshold (Sim) that triggers concept expan-sion with the approach of flexible similarity de-tection which is extensively described in one ofour previous works [64]

VarPart The construct initiates preference queryingIndividual likings are expressed as a graph pat-tern in a RecWhereClause The variable (Var) isa placeholder for the LOD resources that will beused to set up the user profile The compiler hasto make sure that the specified variable occurs inthe RecWhereClause as well Otherwise the ex-traction of LOD resources cannot be carried outcorrectly

Sim The Sim part of a SKOSRec query denotes thethreshold of concept-to-concept scores for flexi-ble similarity detection Even though knowledge-based similarity values usually range between 0and 1 the specification allows a broader range ofdigits in the Decimal datatype to enable applica-tion of other similarity metrics in case they areneeded

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users canstate whether scores should be ldquolargerrdquo ldquolargerthanrdquo or ldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three dif-ferent sections of a SKOSRec query ie either inthe prefilter (ItemPart) the postfilter (SelectPart)or the preference filter section (VarPart) TheRecWhereClause closely resembles the ldquoWhere-Clauserdquo of the SPARQL 11 specification withall its subsequent parts [21] Hence it enablesdifferent combinations of graph pattern match-ing However the RecWhereClause of the SKOS-Rec language allows fewer pattern matching ex-pressions than the ldquoWhereClauserdquo of the regular

SPARQL syntax specification [21] For instanceit is neither permitted to formulate subqueries norto direct filter requests to more than one reposi-tory (see RecGraphPatternNotTriples for furtherexplanations) This modification has been madeto simplify similarity calculation After parsingthis section the compiler makes sure that recom-mendation preference or postfilter variables oc-cur at least once in the RecWhereClause to guar-antee that subsequent steps of the recommenda-tion workflow can be applied on existing LOD re-sources In case the RecWhereClause comprises aRecMinusGraphPattern it also has to be checkedthat the variable is not only contained in theMINUS part of the group

RecGroupGraphPattern A RecGroupGraphPatterncan contain one to many basic graph patternswhich can be differently combined according toRecGroupGraphPatternSub

RecGroupGraphPatternSub This section defines thegraph patterns that are applied to identify suit-able LOD resources User filters can be either ex-pressed as basic graph patterns (TriplesBlock) oras combinations of them (RecGraphPatternNot-Triples) The sequence of different kinds of pat-terns can be flexibly specified

RecGraphPatternNotTriples This query part closelyresembles the similar named ldquoGraphPatternNot-Triplesrdquo of the SPARQL 11 specification [21]The only difference to SPARQL is that the Rec-GraphPatternNotTriples section is a little morerestricted in terms of admissible graph patternsthan the ldquoGraphPatternNotTriplesrdquo section of theregular SPARQL specification For instance itdoes not allow to retrieve LOD resources otherthan from the default graph that is specified in theconfiguration of the SKOSRec engine since filterpatterns have to be applied to datasets that containSKOS annotations Hence they need to be speci-fied before runtime execution to ensure that sim-ilarity calculation can be carried out correctly Ifa user was able to formulate filter conditions thatreferred to different RDF graphs and endpointsaccordingly this would not be possible

RecGroupOrUnionGraphPattern This query sec-tion marks an alternative graph pattern in the styleof the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntaxspecification [21] However it neither allowsSPARQL-like subqueries nor federated queries

30 L Wenige et al Similarity-based Graph Queries

for filtered resources because this would compli-cate similarity calculation

RecOptionalGraphPattern In this part of a SKOS-RecQuery users are enabled to specify additionalgraph patterns which might extend the query so-lution but do not necessarily have to match thedata

RecMinusGraphPattern This section handles exclu-sion of LOD resources for cases when users wantto omit certain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

L Wenige et al Similarity-based Graph Queries 31

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM A

mobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige J Ruhland Flexible on-the-fly recommendationsfrom Linked Open Data repositories In International Confer-ence on Business Information Systems pp 43ndash54 2016

[63] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[64] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[65] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[66] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[67] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Future Work amp Conclusion
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 21: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge … · Semantic Web 0 (0) 1 1 IOS Press Similarity-based Knowledge Graph Queries for Recommendation Retrieval Lisa Wenige*,

L Wenige et al Similarity-based Graph Queries 21

12) The on-the-fly retrieval approach was also testedwith regard to the Novelty dimension Table 12 showsthe mean novelty scores (nv) for the baseline recom-mendation approach Novelty scores indicate that themajority of items in a result list were deemed to be newin the travel and book RS whereas in the experimentson movie and music RS only one third of the recom-mendation list was unfamiliar to users In contrast tothat content-based Diversity scores (divC) were ratherhigh in each domain Hence it can be concluded thatitems in a result set usually have a low rate of match-ing SKOS annotations In addition to content-basedscores user assessments of recommendation list Di-versity (divU) were also taken into account The me-dian level of agreement to the Diversity-related Lik-ert item stating that recommendations are diverse liedat 3 (ldquoneutralrdquo) in the multimedia domains and at 4(ldquoagreerdquo) in the travel domainIn summary the baseline approach achieved fairlygood performance results in TC1 The findings demon-strate that LOD-enabled recommendations can po-tentially enhance a userrsquos search experience Theyalso allow for subsequent tests of advanced retrievaltechniques which can be compared to this standardmethod

In the analysis of performance results from TC2(constraint-based recommendations) particular atten-tion was paid to participantsrsquo agreements with thestatement ldquoThe filter has improved the recommenda-tion results of the listrdquo Fig 8 shows the distribution ofdomain-wise agreement ratios On average they werefairly high Throughout the domains the vast majorityof participants either selected 5 (ldquostrongly agreerdquo) 4(ldquoagreerdquo) or 3 (ldquoneutralrdquo) Hence a filter often had apositive impact on results

The authors gathered responses to the questionon improvement for recommendation lists resultingfrom both the simple and the expressive filtering ap-proaches Hence they give clues about the generalperformance of constraint-based retrieval Howeversince this section of the study revealed the experimen-tal condition to participants (ie application of a fil-ter) answers to the Likert statement have to be inter-preted cautiously Results may be slightly biased asparticipants were able to guess the underlying agendaHowever these limitations do not exist for compar-isons between simple and expressive filters Since theweb application randomly assigned the list order par-ticipants did not know which approach they were as-sessing This enables us to take a closer look at the dif-ferences between the two filtering approaches It was

Table 12Performance results in TC1 (M = Mean S D = Standard DeviationN = No of participants)

Dimension (Metric)[Max Score]

Domain Results

M SD N

Accuracy (size) [10]

Travel 1000 000 103Movie 1000 000 50Music 1000 000 53Book 980 140 51

Accuracy (mrs) [1]

Travel 5649 000 103Movie 6225 1915 50Music 5570 010 53Book 5747 1555 50

Novelty (nv) [1]

Travel 060 034 101Movie 036 031 50Music 036 039 52Book 061 029 47

Diversity (divU) [5]

Travel 4 - 102Movie 3 - 50Music 3 - 53Book 3 - 50

Diversity (divC) [1]

Travel 080 016 103Movie 087 012 50Music 086 010 50Book 092 008 50

Usefulness [5]

Travel 334 085 103Movie 355 073 50Music 318 085 53Book 343 081 50

hypothesized that advanced filters improve recommen-dation quality as they may identify more relevant re-sources thereby overcoming problems of data qualityor data sparsity in LOD repositories Table 13 lists re-sponse rates of the two methods The rate measuresthe ratio of non-zero result sets among all result setsThroughout the domains recommendation lists result-ing from expressive filtering achieved higher responserates

Table 13Response Rates (TC2)

Domain Constr-based(Regular)

Constr-based(Expanded)

N

Travel 23 57 103

Movie 86 88 50

Music 72 75 53

Book 73 76 51

22 L Wenige et al Similarity-based Graph Queries

000

025

050

075

100

Digital Library Travel Movie Music Book

Domain

Per

cent

age

of A

gree

men

t

Degree of Agreement

1

2

3

4

5

The filter has improved the recommendations of the list

Fig 8 Participantsrsquo agreement with the statement that the filter has improved the result list (domain-wise) (TC2)

In addition to response rates user assessments andscores in the defined quality dimensions were com-pared Table 14 depicts the outcome of this analysisSignificantly higher scores are marked in bold figuresIn case statistical tests identified no meaningful varia-tions the results of the better performing approach areunderlined A-levels were adjusted with the false dis-covery rate (FDR) in order to decrease the probabilityof making a type I error due to multiple testing

According to mean scores expressive requestsgenerated more comprehensive recommendation lists(size) This finding is in line with the increased re-sponse rates shown in Table 13 However subse-quently conducted t-tests confirmed significant dif-ferences only for the travel experiment (t(102) =minus789 p lt 0001) In contrast precision scores werehigher in most of the domains (travel movie mu-sic) when the SKOSRec engine processed a simpleconstraint-based query However statistical tests didnot confirm any systematic differences The authorsapplied Wilcoxon tests for mrs scores and the remain-ing performance metrics because of the small samplesizes resulting from the low response rates of the sim-ple constraint-based approachRegarding Novelty none of the approaches was supe-rior In the Diversity dimension expressive filters gen-erated suggestions that were at least as topically di-versified as the results from simple constraint-basedretrieval Even though no significant differences were

identified between the two approaches increased meanscores (divU in the travel domain and divC in each do-main) indicate a slight superiority of expanded filterrequests in this quality dimension The same appliesto usefulness scores (mean values were higher in 3 outof 4 domains when expressive filtering was applied)However these statements are speculative at best be-cause statistical tests were not significantIn summary it can be concluded that expressive fil-ters improve recall potentially leading to a slight lossin precision scores (see Table 14) It may also be thecase that expanded user constraints diversify as wellas increase the usefulness of recommendation listsHowever these claims are unverified In contrast itis safe to say that expressive query patterns generateresults of at least the same quality as simple patternswhile increasing the number of relevant suggestionsThe few differences may be explained by the high Jac-card scores (JI) (see Table 15) of recommendation listsThroughout the domains result sets were much alikeGiven these similarities and the increases in list sizesfor expressive filters it is supposed that an expandedconstraint produces an enhanced version of the recom-mendation list resulting from simple filtering There-fore expanded filters should be the default setting in aconstraint-based retrieval context

In TC3 (ie evaluation of roll-up query patterns)the SKOSRec engine almost always generated non-empty recommendation lists (see Table 16) In the

L Wenige et al Similarity-based Graph Queries 23

Table 14Performance results in TC2 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Simple Expressive N

M SD M SD

Accuracy (size) [10]

Travel 117 270 452 462 103Movie 672 408 720 392 50Music 596 464 636 451 53Book 459 450 500 451 51

Accuracy (mrs) [1]

Travel 6673 2051 6534 2236 23Movie 6095 2298 6029 2279 43Music 6377 1605 6296 1644 34Book 5617 1818 5681 1681 37

Novelty (nv) [1]

Travel 048 041 061 049 21Movie 044 036 055 080 42Music 055 039 052 037 38Book 077 034 076 033 33

Diversity (divU) [5]

Travel 3 - 35 - 24Movie 4 - 4 - 43Music 4 - 4 - 38Book 3 - 3 - 37

Diversity (divC) [1]

Travel 068 018 072 021 19Movie 070 024 073 023 41Music 086 013 087 012 34Book 084 024 095 063 29

Usefulness [5]

Travel 333 093 350 077 24Movie 332 085 331 086 43Music 355 081 364 078 38Book 339 083 362 121 37

Table 15Jaccard Indices (TC2)

Domain Mean Jaccard Index (JI) SD N

Travel 057 047 25

Movie 071 038 43

Music 082 035 37

Book 092 026 35

travel and the book experiment the response rate washigher for aggregation-based queries whereas in themovie domain the engine more often provided recom-mendations for regular requests However these dif-ferences are only marginal and do not indicate a clearsuperiority of one method

Fig 9 depicts participants general agreement to Lik-ert statements concerning the Perceived Usefulness ofthe approaches in TC3 The diagram shows similarresults for regular as well as advanced requests Onaverage satisfaction scores reached levels which layslightly above neutral agreement A subsequent t-test

Table 16Response Rates (TC3)

Domain Regular Aggr-based N

Travel 99 100 103

Movie 96 92 50

Music 100 100 53

Book 96 100 51

confirmed no significant differences between the twomethods In fact the approach with the highest aver-age score of Perceived Usefulness was different in eachexperiment While mean values were higher for regu-lar requests in the studies on travel and music RS ad-vanced queries produced higher scores in the movieand book experiments

Content-based Diversity (divC) was the only met-ric with systematic differences Here scores went upfor advanced retrieval patterns in the travel (t(98) =minus550 p lt 0001) the music (t(52) = minus451 p lt

0001) and the book domain (t(47) = minus490 p lt

24 L Wenige et al Similarity-based Graph Queries

0

1

2

3

Travel Movie Music Book

Domain

Agr

eem

ent

Method

Regular

Rollminusup

Perceived Usefulness (Regular vs Rollminusup Queries)

Fig 9 Participantrsquos overall satisfaction with regular and roll-up recommendation retrieval (TC3)

0001) (see Table 17) This finding is not surprisinggiven that the aggregation-based approach explores thewider semantic network of each item in the profile

Hence it can be assumed that both approaches pro-duce recommendation lists of similar quality and canprovide benefits to potential consumers Thus it can-not be clearly stated which approach is superior be-cause neither of them outperformed the other in eachdomain While the regular retrieval strategy was bet-ter in terms of usefulness in the travel and the mu-sic domains the aggregation-based approach achievedhigher scores in the domains of movie and book rec-ommendationsThis conclusion is especially impressive given thelow mean Jaccard scores for recommendation lists re-sulting from regular and aggregation-based retrieval(see Table 18) The mean score never exceeded 01

throughout the domains It indicates a low concor-dance between result sets It is remarkable that par-ticipants perceived the recommendations as equallygood given the high dissimilarity of the lists Thusadvanced queries have an added-value as they provideadditional interesting results They can help to exploreLOD repositories in different ways than a regular re-trieval approach Therefore it is worthwhile to offeraggregation-based query patterns as an alternative re-trieval strategy when the regular approach has not pro-duced any helpful recommendations

In addition to TC3 user assessments for TC4 (cross-domain recommendations) were also evaluated How-ever since the authors conducted experiments for TC4only in the multimedia domains the samples were con-siderably smaller Nevertheless it is assumed that theamount of completed user sessions is only just enoughto conduct further statistical analyses Table 19 showsthe response rates for the two retrieval methods Cross-domain patterns had a higher success rate of generat-ing non-zero result sets throughout the domains Par-ticipants almost always received a recommendation forthis query type whereas in case a regular SPARQLquery was executed often they did not receive a singlesuggestion at all

We applied non-parametric tests for the perfor-mance metrics mrs nv divU divC and PerceivedUsefulness due to the low response rates of theSPARQL-based approach which led to a low num-ber of comparable data points accordingly Overall theFDR-adjusted pairwise Wilcoxon tests detected no sig-nificant differences except for user-based Diversityscores (divU) in the movie (Z = minus269 p lt 005) andthe book domain (Z = minus256 p lt 005) The resultsfor the content-based Diversity (divC) metric seem topoint in the same direction because of the high meanvalues for cross-domain queries in each domain Butthe significance of these results was not verified Themrs scores indicate a better performance of regular

L Wenige et al Similarity-based Graph Queries 25

Table 17Performance results in TC3 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular Aggr-based N

M SD M SD

Accuracy (size) [10]

Travel 990 099 942 207 103Movie 288 059 276 082 50Music 1000 000 1000 000 53Book 288 059 300 000 51

Accuracy (mrs) [1]

Travel 4821 2498 4493 2559 102Movie 5458 2288 5669 2542 44Music 5395 1831 5824 1619 52Book 5153 2390 4925 2230 49

Novelty (nv) [1]

Travel 043 039 038 040 98Movie 035 038 028 041 41Music 049 036 046 039 50Book 061 042 062 044 47

Diversity (divU) [5]

Travel 35 - 3 - 100Movie 4 - 3 - 43Music 4 - 3 - 50Book 3 - 3 - 48

Diversity (divC) [1]

Travel 079 017 089 013 99Movie 075 020 066 027 44Music 083 017 093 007 53Book 074 027 093 006 48

Usefulness [5]

Travel 336 077 322 074 101Movie 320 092 326 094 44Music 345 084 344 085 51Book 295 087 303 096 48

Table 18Jaccard Indices (TC3)

Domain Mean Jaccard Index (JI) SD N

Travel 007 011 101

Movie 003 007 44

Music 002 005 53

Book 002 006 50

Table 19Response rates (TC4)

Domain Regular (SPARQL) SKOSRec N

Movie 32 96 50

Music 62 98 53

Book 53 100 51

SPARQL queries but the within-subjects test did notconfirm this assumption statistically (see Table 20)

However the most important finding of the anal-ysis was that SKOSRec cross-domain queries gener-ated significantly more suggestions throughout the do-

mains (movie t(49) = minus1792 p lt 0001 mu-sic t(52) = minus2573 p lt 0001 book t(50) =minus1275 p lt 0001) (see Table 20) Fig 10 illustratesthis graphically

Whenever the SPARQL request provided recom-mendations which was the case in less than half ofthe user sessions (see Table 19) the SKOSRec sug-gestions were assessed to be of almost equal qualityas the SPARQL recommendations (see Table 20 Use-fulness) Hence when SPARQL querying did not pro-duce any results in more than the other half of the testcases the SKOSRec approach may still have provideduseful suggestions

Therefore it is reasonable to assume that the capa-bility of the SKOSRec engine to flexibly switch be-tween processing steps of similarity calculation andpattern matching can facilitate an improved explo-ration of LOD repositories Table 21 also shows thatthe items in the two sets often did not match as isindicated by low Jaccard indices This finding furtherdemonstrates that the execution of an additional step

26 L Wenige et al Similarity-based Graph Queries

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

00

25

50

75

100

Movie Music Book

Domain

No

of i

tem

s

Method

Regular

CrossminusDomain

Recommendation List Size

Fig 10 Mean recommendation list sizes for SPARQL and SKOSRec cross-domain requests (TC4)

L Wenige et al Similarity-based Graph Queries 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

of similarity calculation can help to retrieve other rel-evant items

5 Future Work amp Conclusion

This paper has presented the query-based SKOS-Rec engine that facilitates new types of recommen-dation requests for LOD repositories The evaluationshave shown that the SKOSRec system can often gener-ate relevant and useful suggestions when certain querytemplates are utilized In many cases the applicationof expressive or advanced query patterns (ie in thecontext of constraint-based retrieval or cross-domainrequests) helped to significantly improve recall valueswhile still providing the same level of quality in theremaining performance dimensions Additionally ad-vanced recommendation requests can be used to gener-ate diversified recommendation lists in case users arenot satisfied with the results of regular recommenda-tion requests (ie roll-up or cross-domain retrieval vsregular requests) While not being superior in each do-main and test case is at least safe to say that the novelrecommendation approaches of the SKOSRec engineconsiderably extend common retrieval methods and atleast provide an alternative search strategy when con-ventional methods fail to provide useful results Addi-tionally the increase in diversity and recall is in linewith findings from previous research on LOD-enabledRS [45]It is a strength of the developed approaches that theyfacilitate combinations of graph-based and similarity-based retrieval at different stages of the recommenda-tion workflow This feature extends general search ca-pabilities for semantic networks It requires future re-search to investigate whether in addition to SKOS fur-ther RDF vocabularies can be used for ad-hoc item-to-item similarity computation to increase the range ofpotential usage scenarios since not every LOD datasetcontains SKOS annotations [53] The application ofadditional properties could also boost the recommen-dation quality of some query patterns (eg such as thequery types used in TC3) In this context it also needsto be determined whether the representation of the user

profile in the SKOSRec query language can be substi-tuted with free-text expressions to enable search-likefunctionalities LOD researchers have already devel-oped engines that can process natural language queriesover RDF data [59] [34] [67] It will have to be inves-tigated how the SKOSRec engine profitably fits intothis landscape of existing retrieval toolsAnother open research question concerns the aspect ofinterfaces to other applications In its standard config-uration the system is not an end-user retrieval enginebut a backend application with which an administra-tor who has domain knowledge (eg of RDF vocab-ularies and data models of a particular LOD reposi-tory and usage scenario) creates the respective querypatterns Although the websites of the online experi-ments represent possible implementations of suitableend-user applications for the SKOSRec engine thereare still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selec-tion of LOD items for the profile was made possibleby previously storing the data in an index that couldthen be accessed and searched before issuing a recom-mendation query However this runs against the ideaof ad-hoc retrieval In this context future research willhave to determine whether it is feasible to generate anon-the-fly mapping from items to the respective LODresources In the case of a commercial application theengine needs to match items with the correspondingentries in the product catalog Even if it is not yet clearwhich UI components can make the best possible useof the SKOSRec enginersquos features the results from theuser experiments can give at least a few suggestionsThe following retrieval options should be the defaultsetting in an interface that contains a LOD-enabledRS component to assist users in finding interestingitems Since the regular approach of on-the-fly re-trieval achieved good results throughout the domainsthe display of simple recommendations generated frompreviously stated user preferences represents a suitablestarting point for retrieval On-the-fly queries could berefined by providing filter options Since the evalua-tions have shown that expressive constraints often leadto increased recall values the application of such afilter might be the best option for assisted retrievalAs in the preparations for the experiments appro-priate graph-based query patterns for the usage sce-nario in question would also have to be determinedby a domain expert before setting up the applicationThe same applies to advanced retrieval patterns whichshould be available to the end user to refine the querywhen both the simple as well as the constraint-based

28 L Wenige et al Similarity-based Graph Queries

approach have failed to provide relevant items Thissuggestion is made because roll-up queries were com-petitive with regular recommendations in the web-based experiments Therefore they represent a viablealternative retrieval strategy For some domains (egfor multimedia retrieval) it is also possible to usean optional selection field to enable the formulationof cross-domain requests Evaluations for this querytype suggest that users might receive interesting results(ie diversified and comprehensive recommendationlists) from such a requestIn summary the contributions of the paper are as fol-lows

ndash Development and implementation of novel re-trieval strategies in the SKOSRec system whichfacilitate personalized requests and leverage thepotential of knowledge graphs in order to addressthe research gap in state-of-the-art recommenda-tion methods with regard to customized queries

ndash Realization of web-based user experiments infour usage scenarios with the following out-comes

bull Successful demonstration that knowledge graphdata is applicable in the context of live recom-mendation scenarios

bull Creation of suitable query patterns that canbe applied to represent individual informationneeds in different application contexts The setof query patterns can be extended with the helpof the SKOSRec query language to meet therequirements of additional usage scenarios

bull Proof that the novel retrieval strategies can besuccessfully applied across various applicationcontexts and recommendation tasks and thatthey outperform conventional content-basedretrieval approaches especially in the perfor-mance dimensions of diversity and recall

The prototypical implementation and the evaluationof the SKOSRec engine in different usage scenarioshave demonstrated that the freely available knowledgesources on the LOD cloud can be successfully appliedto advance state-of-the-art methods in IR and recom-mender systems The proposed personalized and cus-tomized retrieval strategies can leverage the potentialof knowledge graphs by better matching informationneeds of users with the help of semantic search op-tions

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be com-prised of up to six elements of which the partsSimProjection and ItemPart are obligatory Henceusers need to state at least their preferencesand how many recommendations they wouldlike to receive Advanced retrieval patterns canbe formulated as well For instance similar re-source retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (Se-lectPart) of LOD resources As in SPARQL thequery can start with a prologue that abbreviatesIRIs with namespace declarations [21]

SelectPart The SelectPart part enables subqueryingwith recommendation results The surroundingquery is a SELECT query with a slightly simpli-fied WHERE clause (ie RecWhereClause) withwhich postfilter conditions can be formulatedIn this section users have the option to formu-late aggregation-based retrieval requests (Aggre-gation)

Aggregation This query part handles aggregation-based postfiltering Users specify the IRI (IRIref )resource for which similarity scores of sublevelentities are summarized Additionally it is statedhow the data should be aggregated (ie SUMMAX or AVG) as well as which variable (Var)represents the aggregation-based recommenda-tions in the postfilter section The compiler makessure that this variable is actually contained in theRecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refersto the list of generated suggestions It definesthe recommendation variable to which all simi-lar LOD resources are mapped In case pre- andpostfilter conditions are set in the other sectionsthe compiler checks whether the variable (Var)appears in these sections as well Otherwise therequired join operations cannot be executed Itis specified how many recommendations shouldbe displayed (Integer) and whether the requestwill be issued against a default repository or as across-repository request (ServiceIntegration)

ServiceIntegration This section indicates that a userintends to receive recommendations from a dif-ferent SPARQL endpoint than the default end-point that is stated in the standard configurationof the SKOSRec engine It specifies the targetSPARQL endpoint (IRIref ) from which a user

L Wenige et al Similarity-based Graph Queries 29

wants to receive recommendations Upon extrac-tion of SKOS annotations from the default sourceendpoint a subsequent request is sent to the spec-ified target endpoint

ItemPart This section of a SKOSRec query repre-sents a single preference in the user profile How-ever recommendation queries can contain a cou-ple of preference statements A preference is ei-ther expressed as a LOD resource (IRIref ) or asa variable which is bound to a preference query(VarPart) In the latter case the resources areobtained by matching the graph pattern in theVarPart with the triple statements in the reposi-tory The ItemPart statement can be additionallysupplemented with a concept-to-concept similar-ity threshold (Sim) that triggers concept expan-sion with the approach of flexible similarity de-tection which is extensively described in one ofour previous works [64]

VarPart The construct initiates preference queryingIndividual likings are expressed as a graph pat-tern in a RecWhereClause The variable (Var) isa placeholder for the LOD resources that will beused to set up the user profile The compiler hasto make sure that the specified variable occurs inthe RecWhereClause as well Otherwise the ex-traction of LOD resources cannot be carried outcorrectly

Sim The Sim part of a SKOSRec query denotes thethreshold of concept-to-concept scores for flexi-ble similarity detection Even though knowledge-based similarity values usually range between 0and 1 the specification allows a broader range ofdigits in the Decimal datatype to enable applica-tion of other similarity metrics in case they areneeded

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users canstate whether scores should be ldquolargerrdquo ldquolargerthanrdquo or ldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three dif-ferent sections of a SKOSRec query ie either inthe prefilter (ItemPart) the postfilter (SelectPart)or the preference filter section (VarPart) TheRecWhereClause closely resembles the ldquoWhere-Clauserdquo of the SPARQL 11 specification withall its subsequent parts [21] Hence it enablesdifferent combinations of graph pattern match-ing However the RecWhereClause of the SKOS-Rec language allows fewer pattern matching ex-pressions than the ldquoWhereClauserdquo of the regular

SPARQL syntax specification [21] For instanceit is neither permitted to formulate subqueries norto direct filter requests to more than one reposi-tory (see RecGraphPatternNotTriples for furtherexplanations) This modification has been madeto simplify similarity calculation After parsingthis section the compiler makes sure that recom-mendation preference or postfilter variables oc-cur at least once in the RecWhereClause to guar-antee that subsequent steps of the recommenda-tion workflow can be applied on existing LOD re-sources In case the RecWhereClause comprises aRecMinusGraphPattern it also has to be checkedthat the variable is not only contained in theMINUS part of the group

RecGroupGraphPattern A RecGroupGraphPatterncan contain one to many basic graph patternswhich can be differently combined according toRecGroupGraphPatternSub

RecGroupGraphPatternSub This section defines thegraph patterns that are applied to identify suit-able LOD resources User filters can be either ex-pressed as basic graph patterns (TriplesBlock) oras combinations of them (RecGraphPatternNot-Triples) The sequence of different kinds of pat-terns can be flexibly specified

RecGraphPatternNotTriples This query part closelyresembles the similar named ldquoGraphPatternNot-Triplesrdquo of the SPARQL 11 specification [21]The only difference to SPARQL is that the Rec-GraphPatternNotTriples section is a little morerestricted in terms of admissible graph patternsthan the ldquoGraphPatternNotTriplesrdquo section of theregular SPARQL specification For instance itdoes not allow to retrieve LOD resources otherthan from the default graph that is specified in theconfiguration of the SKOSRec engine since filterpatterns have to be applied to datasets that containSKOS annotations Hence they need to be speci-fied before runtime execution to ensure that sim-ilarity calculation can be carried out correctly Ifa user was able to formulate filter conditions thatreferred to different RDF graphs and endpointsaccordingly this would not be possible

RecGroupOrUnionGraphPattern This query sec-tion marks an alternative graph pattern in the styleof the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntaxspecification [21] However it neither allowsSPARQL-like subqueries nor federated queries

30 L Wenige et al Similarity-based Graph Queries

for filtered resources because this would compli-cate similarity calculation

RecOptionalGraphPattern In this part of a SKOS-RecQuery users are enabled to specify additionalgraph patterns which might extend the query so-lution but do not necessarily have to match thedata

RecMinusGraphPattern This section handles exclu-sion of LOD resources for cases when users wantto omit certain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

L Wenige et al Similarity-based Graph Queries 31

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM A

mobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige J Ruhland Flexible on-the-fly recommendationsfrom Linked Open Data repositories In International Confer-ence on Business Information Systems pp 43ndash54 2016

[63] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[64] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[65] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[66] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[67] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Future Work amp Conclusion
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 22: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge … · Semantic Web 0 (0) 1 1 IOS Press Similarity-based Knowledge Graph Queries for Recommendation Retrieval Lisa Wenige*,

22 L Wenige et al Similarity-based Graph Queries

000

025

050

075

100

Digital Library Travel Movie Music Book

Domain

Per

cent

age

of A

gree

men

t

Degree of Agreement

1

2

3

4

5

The filter has improved the recommendations of the list

Fig 8 Participantsrsquo agreement with the statement that the filter has improved the result list (domain-wise) (TC2)

In addition to response rates user assessments andscores in the defined quality dimensions were com-pared Table 14 depicts the outcome of this analysisSignificantly higher scores are marked in bold figuresIn case statistical tests identified no meaningful varia-tions the results of the better performing approach areunderlined A-levels were adjusted with the false dis-covery rate (FDR) in order to decrease the probabilityof making a type I error due to multiple testing

According to mean scores expressive requestsgenerated more comprehensive recommendation lists(size) This finding is in line with the increased re-sponse rates shown in Table 13 However subse-quently conducted t-tests confirmed significant dif-ferences only for the travel experiment (t(102) =minus789 p lt 0001) In contrast precision scores werehigher in most of the domains (travel movie mu-sic) when the SKOSRec engine processed a simpleconstraint-based query However statistical tests didnot confirm any systematic differences The authorsapplied Wilcoxon tests for mrs scores and the remain-ing performance metrics because of the small samplesizes resulting from the low response rates of the sim-ple constraint-based approachRegarding Novelty none of the approaches was supe-rior In the Diversity dimension expressive filters gen-erated suggestions that were at least as topically di-versified as the results from simple constraint-basedretrieval Even though no significant differences were

identified between the two approaches increased meanscores (divU in the travel domain and divC in each do-main) indicate a slight superiority of expanded filterrequests in this quality dimension The same appliesto usefulness scores (mean values were higher in 3 outof 4 domains when expressive filtering was applied)However these statements are speculative at best be-cause statistical tests were not significantIn summary it can be concluded that expressive fil-ters improve recall potentially leading to a slight lossin precision scores (see Table 14) It may also be thecase that expanded user constraints diversify as wellas increase the usefulness of recommendation listsHowever these claims are unverified In contrast itis safe to say that expressive query patterns generateresults of at least the same quality as simple patternswhile increasing the number of relevant suggestionsThe few differences may be explained by the high Jac-card scores (JI) (see Table 15) of recommendation listsThroughout the domains result sets were much alikeGiven these similarities and the increases in list sizesfor expressive filters it is supposed that an expandedconstraint produces an enhanced version of the recom-mendation list resulting from simple filtering There-fore expanded filters should be the default setting in aconstraint-based retrieval context

In TC3 (ie evaluation of roll-up query patterns)the SKOSRec engine almost always generated non-empty recommendation lists (see Table 16) In the

L Wenige et al Similarity-based Graph Queries 23

Table 14Performance results in TC2 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Simple Expressive N

M SD M SD

Accuracy (size) [10]

Travel 117 270 452 462 103Movie 672 408 720 392 50Music 596 464 636 451 53Book 459 450 500 451 51

Accuracy (mrs) [1]

Travel 6673 2051 6534 2236 23Movie 6095 2298 6029 2279 43Music 6377 1605 6296 1644 34Book 5617 1818 5681 1681 37

Novelty (nv) [1]

Travel 048 041 061 049 21Movie 044 036 055 080 42Music 055 039 052 037 38Book 077 034 076 033 33

Diversity (divU) [5]

Travel 3 - 35 - 24Movie 4 - 4 - 43Music 4 - 4 - 38Book 3 - 3 - 37

Diversity (divC) [1]

Travel 068 018 072 021 19Movie 070 024 073 023 41Music 086 013 087 012 34Book 084 024 095 063 29

Usefulness [5]

Travel 333 093 350 077 24Movie 332 085 331 086 43Music 355 081 364 078 38Book 339 083 362 121 37

Table 15Jaccard Indices (TC2)

Domain Mean Jaccard Index (JI) SD N

Travel 057 047 25

Movie 071 038 43

Music 082 035 37

Book 092 026 35

travel and the book experiment the response rate washigher for aggregation-based queries whereas in themovie domain the engine more often provided recom-mendations for regular requests However these dif-ferences are only marginal and do not indicate a clearsuperiority of one method

Fig 9 depicts participants general agreement to Lik-ert statements concerning the Perceived Usefulness ofthe approaches in TC3 The diagram shows similarresults for regular as well as advanced requests Onaverage satisfaction scores reached levels which layslightly above neutral agreement A subsequent t-test

Table 16Response Rates (TC3)

Domain Regular Aggr-based N

Travel 99 100 103

Movie 96 92 50

Music 100 100 53

Book 96 100 51

confirmed no significant differences between the twomethods In fact the approach with the highest aver-age score of Perceived Usefulness was different in eachexperiment While mean values were higher for regu-lar requests in the studies on travel and music RS ad-vanced queries produced higher scores in the movieand book experiments

Content-based Diversity (divC) was the only met-ric with systematic differences Here scores went upfor advanced retrieval patterns in the travel (t(98) =minus550 p lt 0001) the music (t(52) = minus451 p lt

0001) and the book domain (t(47) = minus490 p lt

24 L Wenige et al Similarity-based Graph Queries

0

1

2

3

Travel Movie Music Book

Domain

Agr

eem

ent

Method

Regular

Rollminusup

Perceived Usefulness (Regular vs Rollminusup Queries)

Fig 9 Participantrsquos overall satisfaction with regular and roll-up recommendation retrieval (TC3)

0001) (see Table 17) This finding is not surprisinggiven that the aggregation-based approach explores thewider semantic network of each item in the profile

Hence it can be assumed that both approaches pro-duce recommendation lists of similar quality and canprovide benefits to potential consumers Thus it can-not be clearly stated which approach is superior be-cause neither of them outperformed the other in eachdomain While the regular retrieval strategy was bet-ter in terms of usefulness in the travel and the mu-sic domains the aggregation-based approach achievedhigher scores in the domains of movie and book rec-ommendationsThis conclusion is especially impressive given thelow mean Jaccard scores for recommendation lists re-sulting from regular and aggregation-based retrieval(see Table 18) The mean score never exceeded 01

throughout the domains It indicates a low concor-dance between result sets It is remarkable that par-ticipants perceived the recommendations as equallygood given the high dissimilarity of the lists Thusadvanced queries have an added-value as they provideadditional interesting results They can help to exploreLOD repositories in different ways than a regular re-trieval approach Therefore it is worthwhile to offeraggregation-based query patterns as an alternative re-trieval strategy when the regular approach has not pro-duced any helpful recommendations

In addition to TC3 user assessments for TC4 (cross-domain recommendations) were also evaluated How-ever since the authors conducted experiments for TC4only in the multimedia domains the samples were con-siderably smaller Nevertheless it is assumed that theamount of completed user sessions is only just enoughto conduct further statistical analyses Table 19 showsthe response rates for the two retrieval methods Cross-domain patterns had a higher success rate of generat-ing non-zero result sets throughout the domains Par-ticipants almost always received a recommendation forthis query type whereas in case a regular SPARQLquery was executed often they did not receive a singlesuggestion at all

We applied non-parametric tests for the perfor-mance metrics mrs nv divU divC and PerceivedUsefulness due to the low response rates of theSPARQL-based approach which led to a low num-ber of comparable data points accordingly Overall theFDR-adjusted pairwise Wilcoxon tests detected no sig-nificant differences except for user-based Diversityscores (divU) in the movie (Z = minus269 p lt 005) andthe book domain (Z = minus256 p lt 005) The resultsfor the content-based Diversity (divC) metric seem topoint in the same direction because of the high meanvalues for cross-domain queries in each domain Butthe significance of these results was not verified Themrs scores indicate a better performance of regular

L Wenige et al Similarity-based Graph Queries 25

Table 17Performance results in TC3 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular Aggr-based N

M SD M SD

Accuracy (size) [10]

Travel 990 099 942 207 103Movie 288 059 276 082 50Music 1000 000 1000 000 53Book 288 059 300 000 51

Accuracy (mrs) [1]

Travel 4821 2498 4493 2559 102Movie 5458 2288 5669 2542 44Music 5395 1831 5824 1619 52Book 5153 2390 4925 2230 49

Novelty (nv) [1]

Travel 043 039 038 040 98Movie 035 038 028 041 41Music 049 036 046 039 50Book 061 042 062 044 47

Diversity (divU) [5]

Travel 35 - 3 - 100Movie 4 - 3 - 43Music 4 - 3 - 50Book 3 - 3 - 48

Diversity (divC) [1]

Travel 079 017 089 013 99Movie 075 020 066 027 44Music 083 017 093 007 53Book 074 027 093 006 48

Usefulness [5]

Travel 336 077 322 074 101Movie 320 092 326 094 44Music 345 084 344 085 51Book 295 087 303 096 48

Table 18Jaccard Indices (TC3)

Domain Mean Jaccard Index (JI) SD N

Travel 007 011 101

Movie 003 007 44

Music 002 005 53

Book 002 006 50

Table 19Response rates (TC4)

Domain Regular (SPARQL) SKOSRec N

Movie 32 96 50

Music 62 98 53

Book 53 100 51

SPARQL queries but the within-subjects test did notconfirm this assumption statistically (see Table 20)

However the most important finding of the anal-ysis was that SKOSRec cross-domain queries gener-ated significantly more suggestions throughout the do-

mains (movie t(49) = minus1792 p lt 0001 mu-sic t(52) = minus2573 p lt 0001 book t(50) =minus1275 p lt 0001) (see Table 20) Fig 10 illustratesthis graphically

Whenever the SPARQL request provided recom-mendations which was the case in less than half ofthe user sessions (see Table 19) the SKOSRec sug-gestions were assessed to be of almost equal qualityas the SPARQL recommendations (see Table 20 Use-fulness) Hence when SPARQL querying did not pro-duce any results in more than the other half of the testcases the SKOSRec approach may still have provideduseful suggestions

Therefore it is reasonable to assume that the capa-bility of the SKOSRec engine to flexibly switch be-tween processing steps of similarity calculation andpattern matching can facilitate an improved explo-ration of LOD repositories Table 21 also shows thatthe items in the two sets often did not match as isindicated by low Jaccard indices This finding furtherdemonstrates that the execution of an additional step

26 L Wenige et al Similarity-based Graph Queries

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

00

25

50

75

100

Movie Music Book

Domain

No

of i

tem

s

Method

Regular

CrossminusDomain

Recommendation List Size

Fig 10 Mean recommendation list sizes for SPARQL and SKOSRec cross-domain requests (TC4)

L Wenige et al Similarity-based Graph Queries 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

of similarity calculation can help to retrieve other rel-evant items

5 Future Work amp Conclusion

This paper has presented the query-based SKOS-Rec engine that facilitates new types of recommen-dation requests for LOD repositories The evaluationshave shown that the SKOSRec system can often gener-ate relevant and useful suggestions when certain querytemplates are utilized In many cases the applicationof expressive or advanced query patterns (ie in thecontext of constraint-based retrieval or cross-domainrequests) helped to significantly improve recall valueswhile still providing the same level of quality in theremaining performance dimensions Additionally ad-vanced recommendation requests can be used to gener-ate diversified recommendation lists in case users arenot satisfied with the results of regular recommenda-tion requests (ie roll-up or cross-domain retrieval vsregular requests) While not being superior in each do-main and test case is at least safe to say that the novelrecommendation approaches of the SKOSRec engineconsiderably extend common retrieval methods and atleast provide an alternative search strategy when con-ventional methods fail to provide useful results Addi-tionally the increase in diversity and recall is in linewith findings from previous research on LOD-enabledRS [45]It is a strength of the developed approaches that theyfacilitate combinations of graph-based and similarity-based retrieval at different stages of the recommenda-tion workflow This feature extends general search ca-pabilities for semantic networks It requires future re-search to investigate whether in addition to SKOS fur-ther RDF vocabularies can be used for ad-hoc item-to-item similarity computation to increase the range ofpotential usage scenarios since not every LOD datasetcontains SKOS annotations [53] The application ofadditional properties could also boost the recommen-dation quality of some query patterns (eg such as thequery types used in TC3) In this context it also needsto be determined whether the representation of the user

profile in the SKOSRec query language can be substi-tuted with free-text expressions to enable search-likefunctionalities LOD researchers have already devel-oped engines that can process natural language queriesover RDF data [59] [34] [67] It will have to be inves-tigated how the SKOSRec engine profitably fits intothis landscape of existing retrieval toolsAnother open research question concerns the aspect ofinterfaces to other applications In its standard config-uration the system is not an end-user retrieval enginebut a backend application with which an administra-tor who has domain knowledge (eg of RDF vocab-ularies and data models of a particular LOD reposi-tory and usage scenario) creates the respective querypatterns Although the websites of the online experi-ments represent possible implementations of suitableend-user applications for the SKOSRec engine thereare still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selec-tion of LOD items for the profile was made possibleby previously storing the data in an index that couldthen be accessed and searched before issuing a recom-mendation query However this runs against the ideaof ad-hoc retrieval In this context future research willhave to determine whether it is feasible to generate anon-the-fly mapping from items to the respective LODresources In the case of a commercial application theengine needs to match items with the correspondingentries in the product catalog Even if it is not yet clearwhich UI components can make the best possible useof the SKOSRec enginersquos features the results from theuser experiments can give at least a few suggestionsThe following retrieval options should be the defaultsetting in an interface that contains a LOD-enabledRS component to assist users in finding interestingitems Since the regular approach of on-the-fly re-trieval achieved good results throughout the domainsthe display of simple recommendations generated frompreviously stated user preferences represents a suitablestarting point for retrieval On-the-fly queries could berefined by providing filter options Since the evalua-tions have shown that expressive constraints often leadto increased recall values the application of such afilter might be the best option for assisted retrievalAs in the preparations for the experiments appro-priate graph-based query patterns for the usage sce-nario in question would also have to be determinedby a domain expert before setting up the applicationThe same applies to advanced retrieval patterns whichshould be available to the end user to refine the querywhen both the simple as well as the constraint-based

28 L Wenige et al Similarity-based Graph Queries

approach have failed to provide relevant items Thissuggestion is made because roll-up queries were com-petitive with regular recommendations in the web-based experiments Therefore they represent a viablealternative retrieval strategy For some domains (egfor multimedia retrieval) it is also possible to usean optional selection field to enable the formulationof cross-domain requests Evaluations for this querytype suggest that users might receive interesting results(ie diversified and comprehensive recommendationlists) from such a requestIn summary the contributions of the paper are as fol-lows

ndash Development and implementation of novel re-trieval strategies in the SKOSRec system whichfacilitate personalized requests and leverage thepotential of knowledge graphs in order to addressthe research gap in state-of-the-art recommenda-tion methods with regard to customized queries

ndash Realization of web-based user experiments infour usage scenarios with the following out-comes

bull Successful demonstration that knowledge graphdata is applicable in the context of live recom-mendation scenarios

bull Creation of suitable query patterns that canbe applied to represent individual informationneeds in different application contexts The setof query patterns can be extended with the helpof the SKOSRec query language to meet therequirements of additional usage scenarios

bull Proof that the novel retrieval strategies can besuccessfully applied across various applicationcontexts and recommendation tasks and thatthey outperform conventional content-basedretrieval approaches especially in the perfor-mance dimensions of diversity and recall

The prototypical implementation and the evaluationof the SKOSRec engine in different usage scenarioshave demonstrated that the freely available knowledgesources on the LOD cloud can be successfully appliedto advance state-of-the-art methods in IR and recom-mender systems The proposed personalized and cus-tomized retrieval strategies can leverage the potentialof knowledge graphs by better matching informationneeds of users with the help of semantic search op-tions

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be com-prised of up to six elements of which the partsSimProjection and ItemPart are obligatory Henceusers need to state at least their preferencesand how many recommendations they wouldlike to receive Advanced retrieval patterns canbe formulated as well For instance similar re-source retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (Se-lectPart) of LOD resources As in SPARQL thequery can start with a prologue that abbreviatesIRIs with namespace declarations [21]

SelectPart The SelectPart part enables subqueryingwith recommendation results The surroundingquery is a SELECT query with a slightly simpli-fied WHERE clause (ie RecWhereClause) withwhich postfilter conditions can be formulatedIn this section users have the option to formu-late aggregation-based retrieval requests (Aggre-gation)

Aggregation This query part handles aggregation-based postfiltering Users specify the IRI (IRIref )resource for which similarity scores of sublevelentities are summarized Additionally it is statedhow the data should be aggregated (ie SUMMAX or AVG) as well as which variable (Var)represents the aggregation-based recommenda-tions in the postfilter section The compiler makessure that this variable is actually contained in theRecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refersto the list of generated suggestions It definesthe recommendation variable to which all simi-lar LOD resources are mapped In case pre- andpostfilter conditions are set in the other sectionsthe compiler checks whether the variable (Var)appears in these sections as well Otherwise therequired join operations cannot be executed Itis specified how many recommendations shouldbe displayed (Integer) and whether the requestwill be issued against a default repository or as across-repository request (ServiceIntegration)

ServiceIntegration This section indicates that a userintends to receive recommendations from a dif-ferent SPARQL endpoint than the default end-point that is stated in the standard configurationof the SKOSRec engine It specifies the targetSPARQL endpoint (IRIref ) from which a user

L Wenige et al Similarity-based Graph Queries 29

wants to receive recommendations Upon extrac-tion of SKOS annotations from the default sourceendpoint a subsequent request is sent to the spec-ified target endpoint

ItemPart This section of a SKOSRec query repre-sents a single preference in the user profile How-ever recommendation queries can contain a cou-ple of preference statements A preference is ei-ther expressed as a LOD resource (IRIref ) or asa variable which is bound to a preference query(VarPart) In the latter case the resources areobtained by matching the graph pattern in theVarPart with the triple statements in the reposi-tory The ItemPart statement can be additionallysupplemented with a concept-to-concept similar-ity threshold (Sim) that triggers concept expan-sion with the approach of flexible similarity de-tection which is extensively described in one ofour previous works [64]

VarPart The construct initiates preference queryingIndividual likings are expressed as a graph pat-tern in a RecWhereClause The variable (Var) isa placeholder for the LOD resources that will beused to set up the user profile The compiler hasto make sure that the specified variable occurs inthe RecWhereClause as well Otherwise the ex-traction of LOD resources cannot be carried outcorrectly

Sim The Sim part of a SKOSRec query denotes thethreshold of concept-to-concept scores for flexi-ble similarity detection Even though knowledge-based similarity values usually range between 0and 1 the specification allows a broader range ofdigits in the Decimal datatype to enable applica-tion of other similarity metrics in case they areneeded

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users canstate whether scores should be ldquolargerrdquo ldquolargerthanrdquo or ldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three dif-ferent sections of a SKOSRec query ie either inthe prefilter (ItemPart) the postfilter (SelectPart)or the preference filter section (VarPart) TheRecWhereClause closely resembles the ldquoWhere-Clauserdquo of the SPARQL 11 specification withall its subsequent parts [21] Hence it enablesdifferent combinations of graph pattern match-ing However the RecWhereClause of the SKOS-Rec language allows fewer pattern matching ex-pressions than the ldquoWhereClauserdquo of the regular

SPARQL syntax specification [21] For instanceit is neither permitted to formulate subqueries norto direct filter requests to more than one reposi-tory (see RecGraphPatternNotTriples for furtherexplanations) This modification has been madeto simplify similarity calculation After parsingthis section the compiler makes sure that recom-mendation preference or postfilter variables oc-cur at least once in the RecWhereClause to guar-antee that subsequent steps of the recommenda-tion workflow can be applied on existing LOD re-sources In case the RecWhereClause comprises aRecMinusGraphPattern it also has to be checkedthat the variable is not only contained in theMINUS part of the group

RecGroupGraphPattern A RecGroupGraphPatterncan contain one to many basic graph patternswhich can be differently combined according toRecGroupGraphPatternSub

RecGroupGraphPatternSub This section defines thegraph patterns that are applied to identify suit-able LOD resources User filters can be either ex-pressed as basic graph patterns (TriplesBlock) oras combinations of them (RecGraphPatternNot-Triples) The sequence of different kinds of pat-terns can be flexibly specified

RecGraphPatternNotTriples This query part closelyresembles the similar named ldquoGraphPatternNot-Triplesrdquo of the SPARQL 11 specification [21]The only difference to SPARQL is that the Rec-GraphPatternNotTriples section is a little morerestricted in terms of admissible graph patternsthan the ldquoGraphPatternNotTriplesrdquo section of theregular SPARQL specification For instance itdoes not allow to retrieve LOD resources otherthan from the default graph that is specified in theconfiguration of the SKOSRec engine since filterpatterns have to be applied to datasets that containSKOS annotations Hence they need to be speci-fied before runtime execution to ensure that sim-ilarity calculation can be carried out correctly Ifa user was able to formulate filter conditions thatreferred to different RDF graphs and endpointsaccordingly this would not be possible

RecGroupOrUnionGraphPattern This query sec-tion marks an alternative graph pattern in the styleof the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntaxspecification [21] However it neither allowsSPARQL-like subqueries nor federated queries

30 L Wenige et al Similarity-based Graph Queries

for filtered resources because this would compli-cate similarity calculation

RecOptionalGraphPattern In this part of a SKOS-RecQuery users are enabled to specify additionalgraph patterns which might extend the query so-lution but do not necessarily have to match thedata

RecMinusGraphPattern This section handles exclu-sion of LOD resources for cases when users wantto omit certain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

L Wenige et al Similarity-based Graph Queries 31

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM A

mobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige J Ruhland Flexible on-the-fly recommendationsfrom Linked Open Data repositories In International Confer-ence on Business Information Systems pp 43ndash54 2016

[63] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[64] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[65] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[66] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[67] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Future Work amp Conclusion
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 23: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge … · Semantic Web 0 (0) 1 1 IOS Press Similarity-based Knowledge Graph Queries for Recommendation Retrieval Lisa Wenige*,

L Wenige et al Similarity-based Graph Queries 23

Table 14Performance results in TC2 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Simple Expressive N

M SD M SD

Accuracy (size) [10]

Travel 117 270 452 462 103Movie 672 408 720 392 50Music 596 464 636 451 53Book 459 450 500 451 51

Accuracy (mrs) [1]

Travel 6673 2051 6534 2236 23Movie 6095 2298 6029 2279 43Music 6377 1605 6296 1644 34Book 5617 1818 5681 1681 37

Novelty (nv) [1]

Travel 048 041 061 049 21Movie 044 036 055 080 42Music 055 039 052 037 38Book 077 034 076 033 33

Diversity (divU) [5]

Travel 3 - 35 - 24Movie 4 - 4 - 43Music 4 - 4 - 38Book 3 - 3 - 37

Diversity (divC) [1]

Travel 068 018 072 021 19Movie 070 024 073 023 41Music 086 013 087 012 34Book 084 024 095 063 29

Usefulness [5]

Travel 333 093 350 077 24Movie 332 085 331 086 43Music 355 081 364 078 38Book 339 083 362 121 37

Table 15Jaccard Indices (TC2)

Domain Mean Jaccard Index (JI) SD N

Travel 057 047 25

Movie 071 038 43

Music 082 035 37

Book 092 026 35

travel and the book experiment the response rate washigher for aggregation-based queries whereas in themovie domain the engine more often provided recom-mendations for regular requests However these dif-ferences are only marginal and do not indicate a clearsuperiority of one method

Fig 9 depicts participants general agreement to Lik-ert statements concerning the Perceived Usefulness ofthe approaches in TC3 The diagram shows similarresults for regular as well as advanced requests Onaverage satisfaction scores reached levels which layslightly above neutral agreement A subsequent t-test

Table 16Response Rates (TC3)

Domain Regular Aggr-based N

Travel 99 100 103

Movie 96 92 50

Music 100 100 53

Book 96 100 51

confirmed no significant differences between the twomethods In fact the approach with the highest aver-age score of Perceived Usefulness was different in eachexperiment While mean values were higher for regu-lar requests in the studies on travel and music RS ad-vanced queries produced higher scores in the movieand book experiments

Content-based Diversity (divC) was the only met-ric with systematic differences Here scores went upfor advanced retrieval patterns in the travel (t(98) =minus550 p lt 0001) the music (t(52) = minus451 p lt

0001) and the book domain (t(47) = minus490 p lt

24 L Wenige et al Similarity-based Graph Queries

0

1

2

3

Travel Movie Music Book

Domain

Agr

eem

ent

Method

Regular

Rollminusup

Perceived Usefulness (Regular vs Rollminusup Queries)

Fig 9 Participantrsquos overall satisfaction with regular and roll-up recommendation retrieval (TC3)

0001) (see Table 17) This finding is not surprisinggiven that the aggregation-based approach explores thewider semantic network of each item in the profile

Hence it can be assumed that both approaches pro-duce recommendation lists of similar quality and canprovide benefits to potential consumers Thus it can-not be clearly stated which approach is superior be-cause neither of them outperformed the other in eachdomain While the regular retrieval strategy was bet-ter in terms of usefulness in the travel and the mu-sic domains the aggregation-based approach achievedhigher scores in the domains of movie and book rec-ommendationsThis conclusion is especially impressive given thelow mean Jaccard scores for recommendation lists re-sulting from regular and aggregation-based retrieval(see Table 18) The mean score never exceeded 01

throughout the domains It indicates a low concor-dance between result sets It is remarkable that par-ticipants perceived the recommendations as equallygood given the high dissimilarity of the lists Thusadvanced queries have an added-value as they provideadditional interesting results They can help to exploreLOD repositories in different ways than a regular re-trieval approach Therefore it is worthwhile to offeraggregation-based query patterns as an alternative re-trieval strategy when the regular approach has not pro-duced any helpful recommendations

In addition to TC3 user assessments for TC4 (cross-domain recommendations) were also evaluated How-ever since the authors conducted experiments for TC4only in the multimedia domains the samples were con-siderably smaller Nevertheless it is assumed that theamount of completed user sessions is only just enoughto conduct further statistical analyses Table 19 showsthe response rates for the two retrieval methods Cross-domain patterns had a higher success rate of generat-ing non-zero result sets throughout the domains Par-ticipants almost always received a recommendation forthis query type whereas in case a regular SPARQLquery was executed often they did not receive a singlesuggestion at all

We applied non-parametric tests for the perfor-mance metrics mrs nv divU divC and PerceivedUsefulness due to the low response rates of theSPARQL-based approach which led to a low num-ber of comparable data points accordingly Overall theFDR-adjusted pairwise Wilcoxon tests detected no sig-nificant differences except for user-based Diversityscores (divU) in the movie (Z = minus269 p lt 005) andthe book domain (Z = minus256 p lt 005) The resultsfor the content-based Diversity (divC) metric seem topoint in the same direction because of the high meanvalues for cross-domain queries in each domain Butthe significance of these results was not verified Themrs scores indicate a better performance of regular

L Wenige et al Similarity-based Graph Queries 25

Table 17Performance results in TC3 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular Aggr-based N

M SD M SD

Accuracy (size) [10]

Travel 990 099 942 207 103Movie 288 059 276 082 50Music 1000 000 1000 000 53Book 288 059 300 000 51

Accuracy (mrs) [1]

Travel 4821 2498 4493 2559 102Movie 5458 2288 5669 2542 44Music 5395 1831 5824 1619 52Book 5153 2390 4925 2230 49

Novelty (nv) [1]

Travel 043 039 038 040 98Movie 035 038 028 041 41Music 049 036 046 039 50Book 061 042 062 044 47

Diversity (divU) [5]

Travel 35 - 3 - 100Movie 4 - 3 - 43Music 4 - 3 - 50Book 3 - 3 - 48

Diversity (divC) [1]

Travel 079 017 089 013 99Movie 075 020 066 027 44Music 083 017 093 007 53Book 074 027 093 006 48

Usefulness [5]

Travel 336 077 322 074 101Movie 320 092 326 094 44Music 345 084 344 085 51Book 295 087 303 096 48

Table 18Jaccard Indices (TC3)

Domain Mean Jaccard Index (JI) SD N

Travel 007 011 101

Movie 003 007 44

Music 002 005 53

Book 002 006 50

Table 19Response rates (TC4)

Domain Regular (SPARQL) SKOSRec N

Movie 32 96 50

Music 62 98 53

Book 53 100 51

SPARQL queries but the within-subjects test did notconfirm this assumption statistically (see Table 20)

However the most important finding of the anal-ysis was that SKOSRec cross-domain queries gener-ated significantly more suggestions throughout the do-

mains (movie t(49) = minus1792 p lt 0001 mu-sic t(52) = minus2573 p lt 0001 book t(50) =minus1275 p lt 0001) (see Table 20) Fig 10 illustratesthis graphically

Whenever the SPARQL request provided recom-mendations which was the case in less than half ofthe user sessions (see Table 19) the SKOSRec sug-gestions were assessed to be of almost equal qualityas the SPARQL recommendations (see Table 20 Use-fulness) Hence when SPARQL querying did not pro-duce any results in more than the other half of the testcases the SKOSRec approach may still have provideduseful suggestions

Therefore it is reasonable to assume that the capa-bility of the SKOSRec engine to flexibly switch be-tween processing steps of similarity calculation andpattern matching can facilitate an improved explo-ration of LOD repositories Table 21 also shows thatthe items in the two sets often did not match as isindicated by low Jaccard indices This finding furtherdemonstrates that the execution of an additional step

26 L Wenige et al Similarity-based Graph Queries

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

00

25

50

75

100

Movie Music Book

Domain

No

of i

tem

s

Method

Regular

CrossminusDomain

Recommendation List Size

Fig 10 Mean recommendation list sizes for SPARQL and SKOSRec cross-domain requests (TC4)

L Wenige et al Similarity-based Graph Queries 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

of similarity calculation can help to retrieve other rel-evant items

5 Future Work amp Conclusion

This paper has presented the query-based SKOS-Rec engine that facilitates new types of recommen-dation requests for LOD repositories The evaluationshave shown that the SKOSRec system can often gener-ate relevant and useful suggestions when certain querytemplates are utilized In many cases the applicationof expressive or advanced query patterns (ie in thecontext of constraint-based retrieval or cross-domainrequests) helped to significantly improve recall valueswhile still providing the same level of quality in theremaining performance dimensions Additionally ad-vanced recommendation requests can be used to gener-ate diversified recommendation lists in case users arenot satisfied with the results of regular recommenda-tion requests (ie roll-up or cross-domain retrieval vsregular requests) While not being superior in each do-main and test case is at least safe to say that the novelrecommendation approaches of the SKOSRec engineconsiderably extend common retrieval methods and atleast provide an alternative search strategy when con-ventional methods fail to provide useful results Addi-tionally the increase in diversity and recall is in linewith findings from previous research on LOD-enabledRS [45]It is a strength of the developed approaches that theyfacilitate combinations of graph-based and similarity-based retrieval at different stages of the recommenda-tion workflow This feature extends general search ca-pabilities for semantic networks It requires future re-search to investigate whether in addition to SKOS fur-ther RDF vocabularies can be used for ad-hoc item-to-item similarity computation to increase the range ofpotential usage scenarios since not every LOD datasetcontains SKOS annotations [53] The application ofadditional properties could also boost the recommen-dation quality of some query patterns (eg such as thequery types used in TC3) In this context it also needsto be determined whether the representation of the user

profile in the SKOSRec query language can be substi-tuted with free-text expressions to enable search-likefunctionalities LOD researchers have already devel-oped engines that can process natural language queriesover RDF data [59] [34] [67] It will have to be inves-tigated how the SKOSRec engine profitably fits intothis landscape of existing retrieval toolsAnother open research question concerns the aspect ofinterfaces to other applications In its standard config-uration the system is not an end-user retrieval enginebut a backend application with which an administra-tor who has domain knowledge (eg of RDF vocab-ularies and data models of a particular LOD reposi-tory and usage scenario) creates the respective querypatterns Although the websites of the online experi-ments represent possible implementations of suitableend-user applications for the SKOSRec engine thereare still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selec-tion of LOD items for the profile was made possibleby previously storing the data in an index that couldthen be accessed and searched before issuing a recom-mendation query However this runs against the ideaof ad-hoc retrieval In this context future research willhave to determine whether it is feasible to generate anon-the-fly mapping from items to the respective LODresources In the case of a commercial application theengine needs to match items with the correspondingentries in the product catalog Even if it is not yet clearwhich UI components can make the best possible useof the SKOSRec enginersquos features the results from theuser experiments can give at least a few suggestionsThe following retrieval options should be the defaultsetting in an interface that contains a LOD-enabledRS component to assist users in finding interestingitems Since the regular approach of on-the-fly re-trieval achieved good results throughout the domainsthe display of simple recommendations generated frompreviously stated user preferences represents a suitablestarting point for retrieval On-the-fly queries could berefined by providing filter options Since the evalua-tions have shown that expressive constraints often leadto increased recall values the application of such afilter might be the best option for assisted retrievalAs in the preparations for the experiments appro-priate graph-based query patterns for the usage sce-nario in question would also have to be determinedby a domain expert before setting up the applicationThe same applies to advanced retrieval patterns whichshould be available to the end user to refine the querywhen both the simple as well as the constraint-based

28 L Wenige et al Similarity-based Graph Queries

approach have failed to provide relevant items Thissuggestion is made because roll-up queries were com-petitive with regular recommendations in the web-based experiments Therefore they represent a viablealternative retrieval strategy For some domains (egfor multimedia retrieval) it is also possible to usean optional selection field to enable the formulationof cross-domain requests Evaluations for this querytype suggest that users might receive interesting results(ie diversified and comprehensive recommendationlists) from such a requestIn summary the contributions of the paper are as fol-lows

ndash Development and implementation of novel re-trieval strategies in the SKOSRec system whichfacilitate personalized requests and leverage thepotential of knowledge graphs in order to addressthe research gap in state-of-the-art recommenda-tion methods with regard to customized queries

ndash Realization of web-based user experiments infour usage scenarios with the following out-comes

bull Successful demonstration that knowledge graphdata is applicable in the context of live recom-mendation scenarios

bull Creation of suitable query patterns that canbe applied to represent individual informationneeds in different application contexts The setof query patterns can be extended with the helpof the SKOSRec query language to meet therequirements of additional usage scenarios

bull Proof that the novel retrieval strategies can besuccessfully applied across various applicationcontexts and recommendation tasks and thatthey outperform conventional content-basedretrieval approaches especially in the perfor-mance dimensions of diversity and recall

The prototypical implementation and the evaluationof the SKOSRec engine in different usage scenarioshave demonstrated that the freely available knowledgesources on the LOD cloud can be successfully appliedto advance state-of-the-art methods in IR and recom-mender systems The proposed personalized and cus-tomized retrieval strategies can leverage the potentialof knowledge graphs by better matching informationneeds of users with the help of semantic search op-tions

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be com-prised of up to six elements of which the partsSimProjection and ItemPart are obligatory Henceusers need to state at least their preferencesand how many recommendations they wouldlike to receive Advanced retrieval patterns canbe formulated as well For instance similar re-source retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (Se-lectPart) of LOD resources As in SPARQL thequery can start with a prologue that abbreviatesIRIs with namespace declarations [21]

SelectPart The SelectPart part enables subqueryingwith recommendation results The surroundingquery is a SELECT query with a slightly simpli-fied WHERE clause (ie RecWhereClause) withwhich postfilter conditions can be formulatedIn this section users have the option to formu-late aggregation-based retrieval requests (Aggre-gation)

Aggregation This query part handles aggregation-based postfiltering Users specify the IRI (IRIref )resource for which similarity scores of sublevelentities are summarized Additionally it is statedhow the data should be aggregated (ie SUMMAX or AVG) as well as which variable (Var)represents the aggregation-based recommenda-tions in the postfilter section The compiler makessure that this variable is actually contained in theRecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refersto the list of generated suggestions It definesthe recommendation variable to which all simi-lar LOD resources are mapped In case pre- andpostfilter conditions are set in the other sectionsthe compiler checks whether the variable (Var)appears in these sections as well Otherwise therequired join operations cannot be executed Itis specified how many recommendations shouldbe displayed (Integer) and whether the requestwill be issued against a default repository or as across-repository request (ServiceIntegration)

ServiceIntegration This section indicates that a userintends to receive recommendations from a dif-ferent SPARQL endpoint than the default end-point that is stated in the standard configurationof the SKOSRec engine It specifies the targetSPARQL endpoint (IRIref ) from which a user

L Wenige et al Similarity-based Graph Queries 29

wants to receive recommendations Upon extrac-tion of SKOS annotations from the default sourceendpoint a subsequent request is sent to the spec-ified target endpoint

ItemPart This section of a SKOSRec query repre-sents a single preference in the user profile How-ever recommendation queries can contain a cou-ple of preference statements A preference is ei-ther expressed as a LOD resource (IRIref ) or asa variable which is bound to a preference query(VarPart) In the latter case the resources areobtained by matching the graph pattern in theVarPart with the triple statements in the reposi-tory The ItemPart statement can be additionallysupplemented with a concept-to-concept similar-ity threshold (Sim) that triggers concept expan-sion with the approach of flexible similarity de-tection which is extensively described in one ofour previous works [64]

VarPart The construct initiates preference queryingIndividual likings are expressed as a graph pat-tern in a RecWhereClause The variable (Var) isa placeholder for the LOD resources that will beused to set up the user profile The compiler hasto make sure that the specified variable occurs inthe RecWhereClause as well Otherwise the ex-traction of LOD resources cannot be carried outcorrectly

Sim The Sim part of a SKOSRec query denotes thethreshold of concept-to-concept scores for flexi-ble similarity detection Even though knowledge-based similarity values usually range between 0and 1 the specification allows a broader range ofdigits in the Decimal datatype to enable applica-tion of other similarity metrics in case they areneeded

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users canstate whether scores should be ldquolargerrdquo ldquolargerthanrdquo or ldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three dif-ferent sections of a SKOSRec query ie either inthe prefilter (ItemPart) the postfilter (SelectPart)or the preference filter section (VarPart) TheRecWhereClause closely resembles the ldquoWhere-Clauserdquo of the SPARQL 11 specification withall its subsequent parts [21] Hence it enablesdifferent combinations of graph pattern match-ing However the RecWhereClause of the SKOS-Rec language allows fewer pattern matching ex-pressions than the ldquoWhereClauserdquo of the regular

SPARQL syntax specification [21] For instanceit is neither permitted to formulate subqueries norto direct filter requests to more than one reposi-tory (see RecGraphPatternNotTriples for furtherexplanations) This modification has been madeto simplify similarity calculation After parsingthis section the compiler makes sure that recom-mendation preference or postfilter variables oc-cur at least once in the RecWhereClause to guar-antee that subsequent steps of the recommenda-tion workflow can be applied on existing LOD re-sources In case the RecWhereClause comprises aRecMinusGraphPattern it also has to be checkedthat the variable is not only contained in theMINUS part of the group

RecGroupGraphPattern A RecGroupGraphPatterncan contain one to many basic graph patternswhich can be differently combined according toRecGroupGraphPatternSub

RecGroupGraphPatternSub This section defines thegraph patterns that are applied to identify suit-able LOD resources User filters can be either ex-pressed as basic graph patterns (TriplesBlock) oras combinations of them (RecGraphPatternNot-Triples) The sequence of different kinds of pat-terns can be flexibly specified

RecGraphPatternNotTriples This query part closelyresembles the similar named ldquoGraphPatternNot-Triplesrdquo of the SPARQL 11 specification [21]The only difference to SPARQL is that the Rec-GraphPatternNotTriples section is a little morerestricted in terms of admissible graph patternsthan the ldquoGraphPatternNotTriplesrdquo section of theregular SPARQL specification For instance itdoes not allow to retrieve LOD resources otherthan from the default graph that is specified in theconfiguration of the SKOSRec engine since filterpatterns have to be applied to datasets that containSKOS annotations Hence they need to be speci-fied before runtime execution to ensure that sim-ilarity calculation can be carried out correctly Ifa user was able to formulate filter conditions thatreferred to different RDF graphs and endpointsaccordingly this would not be possible

RecGroupOrUnionGraphPattern This query sec-tion marks an alternative graph pattern in the styleof the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntaxspecification [21] However it neither allowsSPARQL-like subqueries nor federated queries

30 L Wenige et al Similarity-based Graph Queries

for filtered resources because this would compli-cate similarity calculation

RecOptionalGraphPattern In this part of a SKOS-RecQuery users are enabled to specify additionalgraph patterns which might extend the query so-lution but do not necessarily have to match thedata

RecMinusGraphPattern This section handles exclu-sion of LOD resources for cases when users wantto omit certain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

L Wenige et al Similarity-based Graph Queries 31

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM A

mobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige J Ruhland Flexible on-the-fly recommendationsfrom Linked Open Data repositories In International Confer-ence on Business Information Systems pp 43ndash54 2016

[63] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[64] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[65] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[66] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[67] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Future Work amp Conclusion
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 24: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge … · Semantic Web 0 (0) 1 1 IOS Press Similarity-based Knowledge Graph Queries for Recommendation Retrieval Lisa Wenige*,

24 L Wenige et al Similarity-based Graph Queries

0

1

2

3

Travel Movie Music Book

Domain

Agr

eem

ent

Method

Regular

Rollminusup

Perceived Usefulness (Regular vs Rollminusup Queries)

Fig 9 Participantrsquos overall satisfaction with regular and roll-up recommendation retrieval (TC3)

0001) (see Table 17) This finding is not surprisinggiven that the aggregation-based approach explores thewider semantic network of each item in the profile

Hence it can be assumed that both approaches pro-duce recommendation lists of similar quality and canprovide benefits to potential consumers Thus it can-not be clearly stated which approach is superior be-cause neither of them outperformed the other in eachdomain While the regular retrieval strategy was bet-ter in terms of usefulness in the travel and the mu-sic domains the aggregation-based approach achievedhigher scores in the domains of movie and book rec-ommendationsThis conclusion is especially impressive given thelow mean Jaccard scores for recommendation lists re-sulting from regular and aggregation-based retrieval(see Table 18) The mean score never exceeded 01

throughout the domains It indicates a low concor-dance between result sets It is remarkable that par-ticipants perceived the recommendations as equallygood given the high dissimilarity of the lists Thusadvanced queries have an added-value as they provideadditional interesting results They can help to exploreLOD repositories in different ways than a regular re-trieval approach Therefore it is worthwhile to offeraggregation-based query patterns as an alternative re-trieval strategy when the regular approach has not pro-duced any helpful recommendations

In addition to TC3 user assessments for TC4 (cross-domain recommendations) were also evaluated How-ever since the authors conducted experiments for TC4only in the multimedia domains the samples were con-siderably smaller Nevertheless it is assumed that theamount of completed user sessions is only just enoughto conduct further statistical analyses Table 19 showsthe response rates for the two retrieval methods Cross-domain patterns had a higher success rate of generat-ing non-zero result sets throughout the domains Par-ticipants almost always received a recommendation forthis query type whereas in case a regular SPARQLquery was executed often they did not receive a singlesuggestion at all

We applied non-parametric tests for the perfor-mance metrics mrs nv divU divC and PerceivedUsefulness due to the low response rates of theSPARQL-based approach which led to a low num-ber of comparable data points accordingly Overall theFDR-adjusted pairwise Wilcoxon tests detected no sig-nificant differences except for user-based Diversityscores (divU) in the movie (Z = minus269 p lt 005) andthe book domain (Z = minus256 p lt 005) The resultsfor the content-based Diversity (divC) metric seem topoint in the same direction because of the high meanvalues for cross-domain queries in each domain Butthe significance of these results was not verified Themrs scores indicate a better performance of regular

L Wenige et al Similarity-based Graph Queries 25

Table 17Performance results in TC3 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular Aggr-based N

M SD M SD

Accuracy (size) [10]

Travel 990 099 942 207 103Movie 288 059 276 082 50Music 1000 000 1000 000 53Book 288 059 300 000 51

Accuracy (mrs) [1]

Travel 4821 2498 4493 2559 102Movie 5458 2288 5669 2542 44Music 5395 1831 5824 1619 52Book 5153 2390 4925 2230 49

Novelty (nv) [1]

Travel 043 039 038 040 98Movie 035 038 028 041 41Music 049 036 046 039 50Book 061 042 062 044 47

Diversity (divU) [5]

Travel 35 - 3 - 100Movie 4 - 3 - 43Music 4 - 3 - 50Book 3 - 3 - 48

Diversity (divC) [1]

Travel 079 017 089 013 99Movie 075 020 066 027 44Music 083 017 093 007 53Book 074 027 093 006 48

Usefulness [5]

Travel 336 077 322 074 101Movie 320 092 326 094 44Music 345 084 344 085 51Book 295 087 303 096 48

Table 18Jaccard Indices (TC3)

Domain Mean Jaccard Index (JI) SD N

Travel 007 011 101

Movie 003 007 44

Music 002 005 53

Book 002 006 50

Table 19Response rates (TC4)

Domain Regular (SPARQL) SKOSRec N

Movie 32 96 50

Music 62 98 53

Book 53 100 51

SPARQL queries but the within-subjects test did notconfirm this assumption statistically (see Table 20)

However the most important finding of the anal-ysis was that SKOSRec cross-domain queries gener-ated significantly more suggestions throughout the do-

mains (movie t(49) = minus1792 p lt 0001 mu-sic t(52) = minus2573 p lt 0001 book t(50) =minus1275 p lt 0001) (see Table 20) Fig 10 illustratesthis graphically

Whenever the SPARQL request provided recom-mendations which was the case in less than half ofthe user sessions (see Table 19) the SKOSRec sug-gestions were assessed to be of almost equal qualityas the SPARQL recommendations (see Table 20 Use-fulness) Hence when SPARQL querying did not pro-duce any results in more than the other half of the testcases the SKOSRec approach may still have provideduseful suggestions

Therefore it is reasonable to assume that the capa-bility of the SKOSRec engine to flexibly switch be-tween processing steps of similarity calculation andpattern matching can facilitate an improved explo-ration of LOD repositories Table 21 also shows thatthe items in the two sets often did not match as isindicated by low Jaccard indices This finding furtherdemonstrates that the execution of an additional step

26 L Wenige et al Similarity-based Graph Queries

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

00

25

50

75

100

Movie Music Book

Domain

No

of i

tem

s

Method

Regular

CrossminusDomain

Recommendation List Size

Fig 10 Mean recommendation list sizes for SPARQL and SKOSRec cross-domain requests (TC4)

L Wenige et al Similarity-based Graph Queries 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

of similarity calculation can help to retrieve other rel-evant items

5 Future Work amp Conclusion

This paper has presented the query-based SKOS-Rec engine that facilitates new types of recommen-dation requests for LOD repositories The evaluationshave shown that the SKOSRec system can often gener-ate relevant and useful suggestions when certain querytemplates are utilized In many cases the applicationof expressive or advanced query patterns (ie in thecontext of constraint-based retrieval or cross-domainrequests) helped to significantly improve recall valueswhile still providing the same level of quality in theremaining performance dimensions Additionally ad-vanced recommendation requests can be used to gener-ate diversified recommendation lists in case users arenot satisfied with the results of regular recommenda-tion requests (ie roll-up or cross-domain retrieval vsregular requests) While not being superior in each do-main and test case is at least safe to say that the novelrecommendation approaches of the SKOSRec engineconsiderably extend common retrieval methods and atleast provide an alternative search strategy when con-ventional methods fail to provide useful results Addi-tionally the increase in diversity and recall is in linewith findings from previous research on LOD-enabledRS [45]It is a strength of the developed approaches that theyfacilitate combinations of graph-based and similarity-based retrieval at different stages of the recommenda-tion workflow This feature extends general search ca-pabilities for semantic networks It requires future re-search to investigate whether in addition to SKOS fur-ther RDF vocabularies can be used for ad-hoc item-to-item similarity computation to increase the range ofpotential usage scenarios since not every LOD datasetcontains SKOS annotations [53] The application ofadditional properties could also boost the recommen-dation quality of some query patterns (eg such as thequery types used in TC3) In this context it also needsto be determined whether the representation of the user

profile in the SKOSRec query language can be substi-tuted with free-text expressions to enable search-likefunctionalities LOD researchers have already devel-oped engines that can process natural language queriesover RDF data [59] [34] [67] It will have to be inves-tigated how the SKOSRec engine profitably fits intothis landscape of existing retrieval toolsAnother open research question concerns the aspect ofinterfaces to other applications In its standard config-uration the system is not an end-user retrieval enginebut a backend application with which an administra-tor who has domain knowledge (eg of RDF vocab-ularies and data models of a particular LOD reposi-tory and usage scenario) creates the respective querypatterns Although the websites of the online experi-ments represent possible implementations of suitableend-user applications for the SKOSRec engine thereare still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selec-tion of LOD items for the profile was made possibleby previously storing the data in an index that couldthen be accessed and searched before issuing a recom-mendation query However this runs against the ideaof ad-hoc retrieval In this context future research willhave to determine whether it is feasible to generate anon-the-fly mapping from items to the respective LODresources In the case of a commercial application theengine needs to match items with the correspondingentries in the product catalog Even if it is not yet clearwhich UI components can make the best possible useof the SKOSRec enginersquos features the results from theuser experiments can give at least a few suggestionsThe following retrieval options should be the defaultsetting in an interface that contains a LOD-enabledRS component to assist users in finding interestingitems Since the regular approach of on-the-fly re-trieval achieved good results throughout the domainsthe display of simple recommendations generated frompreviously stated user preferences represents a suitablestarting point for retrieval On-the-fly queries could berefined by providing filter options Since the evalua-tions have shown that expressive constraints often leadto increased recall values the application of such afilter might be the best option for assisted retrievalAs in the preparations for the experiments appro-priate graph-based query patterns for the usage sce-nario in question would also have to be determinedby a domain expert before setting up the applicationThe same applies to advanced retrieval patterns whichshould be available to the end user to refine the querywhen both the simple as well as the constraint-based

28 L Wenige et al Similarity-based Graph Queries

approach have failed to provide relevant items Thissuggestion is made because roll-up queries were com-petitive with regular recommendations in the web-based experiments Therefore they represent a viablealternative retrieval strategy For some domains (egfor multimedia retrieval) it is also possible to usean optional selection field to enable the formulationof cross-domain requests Evaluations for this querytype suggest that users might receive interesting results(ie diversified and comprehensive recommendationlists) from such a requestIn summary the contributions of the paper are as fol-lows

ndash Development and implementation of novel re-trieval strategies in the SKOSRec system whichfacilitate personalized requests and leverage thepotential of knowledge graphs in order to addressthe research gap in state-of-the-art recommenda-tion methods with regard to customized queries

ndash Realization of web-based user experiments infour usage scenarios with the following out-comes

bull Successful demonstration that knowledge graphdata is applicable in the context of live recom-mendation scenarios

bull Creation of suitable query patterns that canbe applied to represent individual informationneeds in different application contexts The setof query patterns can be extended with the helpof the SKOSRec query language to meet therequirements of additional usage scenarios

bull Proof that the novel retrieval strategies can besuccessfully applied across various applicationcontexts and recommendation tasks and thatthey outperform conventional content-basedretrieval approaches especially in the perfor-mance dimensions of diversity and recall

The prototypical implementation and the evaluationof the SKOSRec engine in different usage scenarioshave demonstrated that the freely available knowledgesources on the LOD cloud can be successfully appliedto advance state-of-the-art methods in IR and recom-mender systems The proposed personalized and cus-tomized retrieval strategies can leverage the potentialof knowledge graphs by better matching informationneeds of users with the help of semantic search op-tions

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be com-prised of up to six elements of which the partsSimProjection and ItemPart are obligatory Henceusers need to state at least their preferencesand how many recommendations they wouldlike to receive Advanced retrieval patterns canbe formulated as well For instance similar re-source retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (Se-lectPart) of LOD resources As in SPARQL thequery can start with a prologue that abbreviatesIRIs with namespace declarations [21]

SelectPart The SelectPart part enables subqueryingwith recommendation results The surroundingquery is a SELECT query with a slightly simpli-fied WHERE clause (ie RecWhereClause) withwhich postfilter conditions can be formulatedIn this section users have the option to formu-late aggregation-based retrieval requests (Aggre-gation)

Aggregation This query part handles aggregation-based postfiltering Users specify the IRI (IRIref )resource for which similarity scores of sublevelentities are summarized Additionally it is statedhow the data should be aggregated (ie SUMMAX or AVG) as well as which variable (Var)represents the aggregation-based recommenda-tions in the postfilter section The compiler makessure that this variable is actually contained in theRecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refersto the list of generated suggestions It definesthe recommendation variable to which all simi-lar LOD resources are mapped In case pre- andpostfilter conditions are set in the other sectionsthe compiler checks whether the variable (Var)appears in these sections as well Otherwise therequired join operations cannot be executed Itis specified how many recommendations shouldbe displayed (Integer) and whether the requestwill be issued against a default repository or as across-repository request (ServiceIntegration)

ServiceIntegration This section indicates that a userintends to receive recommendations from a dif-ferent SPARQL endpoint than the default end-point that is stated in the standard configurationof the SKOSRec engine It specifies the targetSPARQL endpoint (IRIref ) from which a user

L Wenige et al Similarity-based Graph Queries 29

wants to receive recommendations Upon extrac-tion of SKOS annotations from the default sourceendpoint a subsequent request is sent to the spec-ified target endpoint

ItemPart This section of a SKOSRec query repre-sents a single preference in the user profile How-ever recommendation queries can contain a cou-ple of preference statements A preference is ei-ther expressed as a LOD resource (IRIref ) or asa variable which is bound to a preference query(VarPart) In the latter case the resources areobtained by matching the graph pattern in theVarPart with the triple statements in the reposi-tory The ItemPart statement can be additionallysupplemented with a concept-to-concept similar-ity threshold (Sim) that triggers concept expan-sion with the approach of flexible similarity de-tection which is extensively described in one ofour previous works [64]

VarPart The construct initiates preference queryingIndividual likings are expressed as a graph pat-tern in a RecWhereClause The variable (Var) isa placeholder for the LOD resources that will beused to set up the user profile The compiler hasto make sure that the specified variable occurs inthe RecWhereClause as well Otherwise the ex-traction of LOD resources cannot be carried outcorrectly

Sim The Sim part of a SKOSRec query denotes thethreshold of concept-to-concept scores for flexi-ble similarity detection Even though knowledge-based similarity values usually range between 0and 1 the specification allows a broader range ofdigits in the Decimal datatype to enable applica-tion of other similarity metrics in case they areneeded

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users canstate whether scores should be ldquolargerrdquo ldquolargerthanrdquo or ldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three dif-ferent sections of a SKOSRec query ie either inthe prefilter (ItemPart) the postfilter (SelectPart)or the preference filter section (VarPart) TheRecWhereClause closely resembles the ldquoWhere-Clauserdquo of the SPARQL 11 specification withall its subsequent parts [21] Hence it enablesdifferent combinations of graph pattern match-ing However the RecWhereClause of the SKOS-Rec language allows fewer pattern matching ex-pressions than the ldquoWhereClauserdquo of the regular

SPARQL syntax specification [21] For instanceit is neither permitted to formulate subqueries norto direct filter requests to more than one reposi-tory (see RecGraphPatternNotTriples for furtherexplanations) This modification has been madeto simplify similarity calculation After parsingthis section the compiler makes sure that recom-mendation preference or postfilter variables oc-cur at least once in the RecWhereClause to guar-antee that subsequent steps of the recommenda-tion workflow can be applied on existing LOD re-sources In case the RecWhereClause comprises aRecMinusGraphPattern it also has to be checkedthat the variable is not only contained in theMINUS part of the group

RecGroupGraphPattern A RecGroupGraphPatterncan contain one to many basic graph patternswhich can be differently combined according toRecGroupGraphPatternSub

RecGroupGraphPatternSub This section defines thegraph patterns that are applied to identify suit-able LOD resources User filters can be either ex-pressed as basic graph patterns (TriplesBlock) oras combinations of them (RecGraphPatternNot-Triples) The sequence of different kinds of pat-terns can be flexibly specified

RecGraphPatternNotTriples This query part closelyresembles the similar named ldquoGraphPatternNot-Triplesrdquo of the SPARQL 11 specification [21]The only difference to SPARQL is that the Rec-GraphPatternNotTriples section is a little morerestricted in terms of admissible graph patternsthan the ldquoGraphPatternNotTriplesrdquo section of theregular SPARQL specification For instance itdoes not allow to retrieve LOD resources otherthan from the default graph that is specified in theconfiguration of the SKOSRec engine since filterpatterns have to be applied to datasets that containSKOS annotations Hence they need to be speci-fied before runtime execution to ensure that sim-ilarity calculation can be carried out correctly Ifa user was able to formulate filter conditions thatreferred to different RDF graphs and endpointsaccordingly this would not be possible

RecGroupOrUnionGraphPattern This query sec-tion marks an alternative graph pattern in the styleof the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntaxspecification [21] However it neither allowsSPARQL-like subqueries nor federated queries

30 L Wenige et al Similarity-based Graph Queries

for filtered resources because this would compli-cate similarity calculation

RecOptionalGraphPattern In this part of a SKOS-RecQuery users are enabled to specify additionalgraph patterns which might extend the query so-lution but do not necessarily have to match thedata

RecMinusGraphPattern This section handles exclu-sion of LOD resources for cases when users wantto omit certain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

L Wenige et al Similarity-based Graph Queries 31

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM A

mobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige J Ruhland Flexible on-the-fly recommendationsfrom Linked Open Data repositories In International Confer-ence on Business Information Systems pp 43ndash54 2016

[63] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[64] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[65] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[66] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[67] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Future Work amp Conclusion
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 25: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge … · Semantic Web 0 (0) 1 1 IOS Press Similarity-based Knowledge Graph Queries for Recommendation Retrieval Lisa Wenige*,

L Wenige et al Similarity-based Graph Queries 25

Table 17Performance results in TC3 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular Aggr-based N

M SD M SD

Accuracy (size) [10]

Travel 990 099 942 207 103Movie 288 059 276 082 50Music 1000 000 1000 000 53Book 288 059 300 000 51

Accuracy (mrs) [1]

Travel 4821 2498 4493 2559 102Movie 5458 2288 5669 2542 44Music 5395 1831 5824 1619 52Book 5153 2390 4925 2230 49

Novelty (nv) [1]

Travel 043 039 038 040 98Movie 035 038 028 041 41Music 049 036 046 039 50Book 061 042 062 044 47

Diversity (divU) [5]

Travel 35 - 3 - 100Movie 4 - 3 - 43Music 4 - 3 - 50Book 3 - 3 - 48

Diversity (divC) [1]

Travel 079 017 089 013 99Movie 075 020 066 027 44Music 083 017 093 007 53Book 074 027 093 006 48

Usefulness [5]

Travel 336 077 322 074 101Movie 320 092 326 094 44Music 345 084 344 085 51Book 295 087 303 096 48

Table 18Jaccard Indices (TC3)

Domain Mean Jaccard Index (JI) SD N

Travel 007 011 101

Movie 003 007 44

Music 002 005 53

Book 002 006 50

Table 19Response rates (TC4)

Domain Regular (SPARQL) SKOSRec N

Movie 32 96 50

Music 62 98 53

Book 53 100 51

SPARQL queries but the within-subjects test did notconfirm this assumption statistically (see Table 20)

However the most important finding of the anal-ysis was that SKOSRec cross-domain queries gener-ated significantly more suggestions throughout the do-

mains (movie t(49) = minus1792 p lt 0001 mu-sic t(52) = minus2573 p lt 0001 book t(50) =minus1275 p lt 0001) (see Table 20) Fig 10 illustratesthis graphically

Whenever the SPARQL request provided recom-mendations which was the case in less than half ofthe user sessions (see Table 19) the SKOSRec sug-gestions were assessed to be of almost equal qualityas the SPARQL recommendations (see Table 20 Use-fulness) Hence when SPARQL querying did not pro-duce any results in more than the other half of the testcases the SKOSRec approach may still have provideduseful suggestions

Therefore it is reasonable to assume that the capa-bility of the SKOSRec engine to flexibly switch be-tween processing steps of similarity calculation andpattern matching can facilitate an improved explo-ration of LOD repositories Table 21 also shows thatthe items in the two sets often did not match as isindicated by low Jaccard indices This finding furtherdemonstrates that the execution of an additional step

26 L Wenige et al Similarity-based Graph Queries

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

00

25

50

75

100

Movie Music Book

Domain

No

of i

tem

s

Method

Regular

CrossminusDomain

Recommendation List Size

Fig 10 Mean recommendation list sizes for SPARQL and SKOSRec cross-domain requests (TC4)

L Wenige et al Similarity-based Graph Queries 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

of similarity calculation can help to retrieve other rel-evant items

5 Future Work amp Conclusion

This paper has presented the query-based SKOS-Rec engine that facilitates new types of recommen-dation requests for LOD repositories The evaluationshave shown that the SKOSRec system can often gener-ate relevant and useful suggestions when certain querytemplates are utilized In many cases the applicationof expressive or advanced query patterns (ie in thecontext of constraint-based retrieval or cross-domainrequests) helped to significantly improve recall valueswhile still providing the same level of quality in theremaining performance dimensions Additionally ad-vanced recommendation requests can be used to gener-ate diversified recommendation lists in case users arenot satisfied with the results of regular recommenda-tion requests (ie roll-up or cross-domain retrieval vsregular requests) While not being superior in each do-main and test case is at least safe to say that the novelrecommendation approaches of the SKOSRec engineconsiderably extend common retrieval methods and atleast provide an alternative search strategy when con-ventional methods fail to provide useful results Addi-tionally the increase in diversity and recall is in linewith findings from previous research on LOD-enabledRS [45]It is a strength of the developed approaches that theyfacilitate combinations of graph-based and similarity-based retrieval at different stages of the recommenda-tion workflow This feature extends general search ca-pabilities for semantic networks It requires future re-search to investigate whether in addition to SKOS fur-ther RDF vocabularies can be used for ad-hoc item-to-item similarity computation to increase the range ofpotential usage scenarios since not every LOD datasetcontains SKOS annotations [53] The application ofadditional properties could also boost the recommen-dation quality of some query patterns (eg such as thequery types used in TC3) In this context it also needsto be determined whether the representation of the user

profile in the SKOSRec query language can be substi-tuted with free-text expressions to enable search-likefunctionalities LOD researchers have already devel-oped engines that can process natural language queriesover RDF data [59] [34] [67] It will have to be inves-tigated how the SKOSRec engine profitably fits intothis landscape of existing retrieval toolsAnother open research question concerns the aspect ofinterfaces to other applications In its standard config-uration the system is not an end-user retrieval enginebut a backend application with which an administra-tor who has domain knowledge (eg of RDF vocab-ularies and data models of a particular LOD reposi-tory and usage scenario) creates the respective querypatterns Although the websites of the online experi-ments represent possible implementations of suitableend-user applications for the SKOSRec engine thereare still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selec-tion of LOD items for the profile was made possibleby previously storing the data in an index that couldthen be accessed and searched before issuing a recom-mendation query However this runs against the ideaof ad-hoc retrieval In this context future research willhave to determine whether it is feasible to generate anon-the-fly mapping from items to the respective LODresources In the case of a commercial application theengine needs to match items with the correspondingentries in the product catalog Even if it is not yet clearwhich UI components can make the best possible useof the SKOSRec enginersquos features the results from theuser experiments can give at least a few suggestionsThe following retrieval options should be the defaultsetting in an interface that contains a LOD-enabledRS component to assist users in finding interestingitems Since the regular approach of on-the-fly re-trieval achieved good results throughout the domainsthe display of simple recommendations generated frompreviously stated user preferences represents a suitablestarting point for retrieval On-the-fly queries could berefined by providing filter options Since the evalua-tions have shown that expressive constraints often leadto increased recall values the application of such afilter might be the best option for assisted retrievalAs in the preparations for the experiments appro-priate graph-based query patterns for the usage sce-nario in question would also have to be determinedby a domain expert before setting up the applicationThe same applies to advanced retrieval patterns whichshould be available to the end user to refine the querywhen both the simple as well as the constraint-based

28 L Wenige et al Similarity-based Graph Queries

approach have failed to provide relevant items Thissuggestion is made because roll-up queries were com-petitive with regular recommendations in the web-based experiments Therefore they represent a viablealternative retrieval strategy For some domains (egfor multimedia retrieval) it is also possible to usean optional selection field to enable the formulationof cross-domain requests Evaluations for this querytype suggest that users might receive interesting results(ie diversified and comprehensive recommendationlists) from such a requestIn summary the contributions of the paper are as fol-lows

ndash Development and implementation of novel re-trieval strategies in the SKOSRec system whichfacilitate personalized requests and leverage thepotential of knowledge graphs in order to addressthe research gap in state-of-the-art recommenda-tion methods with regard to customized queries

ndash Realization of web-based user experiments infour usage scenarios with the following out-comes

bull Successful demonstration that knowledge graphdata is applicable in the context of live recom-mendation scenarios

bull Creation of suitable query patterns that canbe applied to represent individual informationneeds in different application contexts The setof query patterns can be extended with the helpof the SKOSRec query language to meet therequirements of additional usage scenarios

bull Proof that the novel retrieval strategies can besuccessfully applied across various applicationcontexts and recommendation tasks and thatthey outperform conventional content-basedretrieval approaches especially in the perfor-mance dimensions of diversity and recall

The prototypical implementation and the evaluationof the SKOSRec engine in different usage scenarioshave demonstrated that the freely available knowledgesources on the LOD cloud can be successfully appliedto advance state-of-the-art methods in IR and recom-mender systems The proposed personalized and cus-tomized retrieval strategies can leverage the potentialof knowledge graphs by better matching informationneeds of users with the help of semantic search op-tions

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be com-prised of up to six elements of which the partsSimProjection and ItemPart are obligatory Henceusers need to state at least their preferencesand how many recommendations they wouldlike to receive Advanced retrieval patterns canbe formulated as well For instance similar re-source retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (Se-lectPart) of LOD resources As in SPARQL thequery can start with a prologue that abbreviatesIRIs with namespace declarations [21]

SelectPart The SelectPart part enables subqueryingwith recommendation results The surroundingquery is a SELECT query with a slightly simpli-fied WHERE clause (ie RecWhereClause) withwhich postfilter conditions can be formulatedIn this section users have the option to formu-late aggregation-based retrieval requests (Aggre-gation)

Aggregation This query part handles aggregation-based postfiltering Users specify the IRI (IRIref )resource for which similarity scores of sublevelentities are summarized Additionally it is statedhow the data should be aggregated (ie SUMMAX or AVG) as well as which variable (Var)represents the aggregation-based recommenda-tions in the postfilter section The compiler makessure that this variable is actually contained in theRecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refersto the list of generated suggestions It definesthe recommendation variable to which all simi-lar LOD resources are mapped In case pre- andpostfilter conditions are set in the other sectionsthe compiler checks whether the variable (Var)appears in these sections as well Otherwise therequired join operations cannot be executed Itis specified how many recommendations shouldbe displayed (Integer) and whether the requestwill be issued against a default repository or as across-repository request (ServiceIntegration)

ServiceIntegration This section indicates that a userintends to receive recommendations from a dif-ferent SPARQL endpoint than the default end-point that is stated in the standard configurationof the SKOSRec engine It specifies the targetSPARQL endpoint (IRIref ) from which a user

L Wenige et al Similarity-based Graph Queries 29

wants to receive recommendations Upon extrac-tion of SKOS annotations from the default sourceendpoint a subsequent request is sent to the spec-ified target endpoint

ItemPart This section of a SKOSRec query repre-sents a single preference in the user profile How-ever recommendation queries can contain a cou-ple of preference statements A preference is ei-ther expressed as a LOD resource (IRIref ) or asa variable which is bound to a preference query(VarPart) In the latter case the resources areobtained by matching the graph pattern in theVarPart with the triple statements in the reposi-tory The ItemPart statement can be additionallysupplemented with a concept-to-concept similar-ity threshold (Sim) that triggers concept expan-sion with the approach of flexible similarity de-tection which is extensively described in one ofour previous works [64]

VarPart The construct initiates preference queryingIndividual likings are expressed as a graph pat-tern in a RecWhereClause The variable (Var) isa placeholder for the LOD resources that will beused to set up the user profile The compiler hasto make sure that the specified variable occurs inthe RecWhereClause as well Otherwise the ex-traction of LOD resources cannot be carried outcorrectly

Sim The Sim part of a SKOSRec query denotes thethreshold of concept-to-concept scores for flexi-ble similarity detection Even though knowledge-based similarity values usually range between 0and 1 the specification allows a broader range ofdigits in the Decimal datatype to enable applica-tion of other similarity metrics in case they areneeded

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users canstate whether scores should be ldquolargerrdquo ldquolargerthanrdquo or ldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three dif-ferent sections of a SKOSRec query ie either inthe prefilter (ItemPart) the postfilter (SelectPart)or the preference filter section (VarPart) TheRecWhereClause closely resembles the ldquoWhere-Clauserdquo of the SPARQL 11 specification withall its subsequent parts [21] Hence it enablesdifferent combinations of graph pattern match-ing However the RecWhereClause of the SKOS-Rec language allows fewer pattern matching ex-pressions than the ldquoWhereClauserdquo of the regular

SPARQL syntax specification [21] For instanceit is neither permitted to formulate subqueries norto direct filter requests to more than one reposi-tory (see RecGraphPatternNotTriples for furtherexplanations) This modification has been madeto simplify similarity calculation After parsingthis section the compiler makes sure that recom-mendation preference or postfilter variables oc-cur at least once in the RecWhereClause to guar-antee that subsequent steps of the recommenda-tion workflow can be applied on existing LOD re-sources In case the RecWhereClause comprises aRecMinusGraphPattern it also has to be checkedthat the variable is not only contained in theMINUS part of the group

RecGroupGraphPattern A RecGroupGraphPatterncan contain one to many basic graph patternswhich can be differently combined according toRecGroupGraphPatternSub

RecGroupGraphPatternSub This section defines thegraph patterns that are applied to identify suit-able LOD resources User filters can be either ex-pressed as basic graph patterns (TriplesBlock) oras combinations of them (RecGraphPatternNot-Triples) The sequence of different kinds of pat-terns can be flexibly specified

RecGraphPatternNotTriples This query part closelyresembles the similar named ldquoGraphPatternNot-Triplesrdquo of the SPARQL 11 specification [21]The only difference to SPARQL is that the Rec-GraphPatternNotTriples section is a little morerestricted in terms of admissible graph patternsthan the ldquoGraphPatternNotTriplesrdquo section of theregular SPARQL specification For instance itdoes not allow to retrieve LOD resources otherthan from the default graph that is specified in theconfiguration of the SKOSRec engine since filterpatterns have to be applied to datasets that containSKOS annotations Hence they need to be speci-fied before runtime execution to ensure that sim-ilarity calculation can be carried out correctly Ifa user was able to formulate filter conditions thatreferred to different RDF graphs and endpointsaccordingly this would not be possible

RecGroupOrUnionGraphPattern This query sec-tion marks an alternative graph pattern in the styleof the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntaxspecification [21] However it neither allowsSPARQL-like subqueries nor federated queries

30 L Wenige et al Similarity-based Graph Queries

for filtered resources because this would compli-cate similarity calculation

RecOptionalGraphPattern In this part of a SKOS-RecQuery users are enabled to specify additionalgraph patterns which might extend the query so-lution but do not necessarily have to match thedata

RecMinusGraphPattern This section handles exclu-sion of LOD resources for cases when users wantto omit certain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

L Wenige et al Similarity-based Graph Queries 31

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM A

mobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige J Ruhland Flexible on-the-fly recommendationsfrom Linked Open Data repositories In International Confer-ence on Business Information Systems pp 43ndash54 2016

[63] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[64] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[65] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[66] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[67] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Future Work amp Conclusion
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 26: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge … · Semantic Web 0 (0) 1 1 IOS Press Similarity-based Knowledge Graph Queries for Recommendation Retrieval Lisa Wenige*,

26 L Wenige et al Similarity-based Graph Queries

Table 20Performance results in TC4 (M = Mean S D = Standard Deviation N = No of participants)

Dimension (Metric)[Max Score]

Domain Regular (SPARQL) Cross-Domain N

M SD M SD

Accuracy (size) [10]Movie 082 209 866 273 50Music 089 159 930 203 53Book 290 398 1000 000 51

Accuracy (mrs) [1]Movie 6741 2762 5365 1471 16Music 6236 3176 4832 2249 20Book 6642 2128 5559 1684 27

Novelty (nv) [1]Movie 059 049 041 034 16Music 072 044 059 040 19Book 054 041 063 030 27

Diversity (divU) [5]Movie 3 - 4 - 16Music 3 - 3 - 20Book 3 - 4 - 27

Diversity (divC) [1]Movie (064) (035) (099) (002) (6)Music (090) (013) (098) (003) (10)Book (084) (012) (087) (01) (8)

Usefulness [5]Movie 357 083 335 157 15Music 353 117 311 098 20Book 341 104 341 094 27

00

25

50

75

100

Movie Music Book

Domain

No

of i

tem

s

Method

Regular

CrossminusDomain

Recommendation List Size

Fig 10 Mean recommendation list sizes for SPARQL and SKOSRec cross-domain requests (TC4)

L Wenige et al Similarity-based Graph Queries 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

of similarity calculation can help to retrieve other rel-evant items

5 Future Work amp Conclusion

This paper has presented the query-based SKOS-Rec engine that facilitates new types of recommen-dation requests for LOD repositories The evaluationshave shown that the SKOSRec system can often gener-ate relevant and useful suggestions when certain querytemplates are utilized In many cases the applicationof expressive or advanced query patterns (ie in thecontext of constraint-based retrieval or cross-domainrequests) helped to significantly improve recall valueswhile still providing the same level of quality in theremaining performance dimensions Additionally ad-vanced recommendation requests can be used to gener-ate diversified recommendation lists in case users arenot satisfied with the results of regular recommenda-tion requests (ie roll-up or cross-domain retrieval vsregular requests) While not being superior in each do-main and test case is at least safe to say that the novelrecommendation approaches of the SKOSRec engineconsiderably extend common retrieval methods and atleast provide an alternative search strategy when con-ventional methods fail to provide useful results Addi-tionally the increase in diversity and recall is in linewith findings from previous research on LOD-enabledRS [45]It is a strength of the developed approaches that theyfacilitate combinations of graph-based and similarity-based retrieval at different stages of the recommenda-tion workflow This feature extends general search ca-pabilities for semantic networks It requires future re-search to investigate whether in addition to SKOS fur-ther RDF vocabularies can be used for ad-hoc item-to-item similarity computation to increase the range ofpotential usage scenarios since not every LOD datasetcontains SKOS annotations [53] The application ofadditional properties could also boost the recommen-dation quality of some query patterns (eg such as thequery types used in TC3) In this context it also needsto be determined whether the representation of the user

profile in the SKOSRec query language can be substi-tuted with free-text expressions to enable search-likefunctionalities LOD researchers have already devel-oped engines that can process natural language queriesover RDF data [59] [34] [67] It will have to be inves-tigated how the SKOSRec engine profitably fits intothis landscape of existing retrieval toolsAnother open research question concerns the aspect ofinterfaces to other applications In its standard config-uration the system is not an end-user retrieval enginebut a backend application with which an administra-tor who has domain knowledge (eg of RDF vocab-ularies and data models of a particular LOD reposi-tory and usage scenario) creates the respective querypatterns Although the websites of the online experi-ments represent possible implementations of suitableend-user applications for the SKOSRec engine thereare still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selec-tion of LOD items for the profile was made possibleby previously storing the data in an index that couldthen be accessed and searched before issuing a recom-mendation query However this runs against the ideaof ad-hoc retrieval In this context future research willhave to determine whether it is feasible to generate anon-the-fly mapping from items to the respective LODresources In the case of a commercial application theengine needs to match items with the correspondingentries in the product catalog Even if it is not yet clearwhich UI components can make the best possible useof the SKOSRec enginersquos features the results from theuser experiments can give at least a few suggestionsThe following retrieval options should be the defaultsetting in an interface that contains a LOD-enabledRS component to assist users in finding interestingitems Since the regular approach of on-the-fly re-trieval achieved good results throughout the domainsthe display of simple recommendations generated frompreviously stated user preferences represents a suitablestarting point for retrieval On-the-fly queries could berefined by providing filter options Since the evalua-tions have shown that expressive constraints often leadto increased recall values the application of such afilter might be the best option for assisted retrievalAs in the preparations for the experiments appro-priate graph-based query patterns for the usage sce-nario in question would also have to be determinedby a domain expert before setting up the applicationThe same applies to advanced retrieval patterns whichshould be available to the end user to refine the querywhen both the simple as well as the constraint-based

28 L Wenige et al Similarity-based Graph Queries

approach have failed to provide relevant items Thissuggestion is made because roll-up queries were com-petitive with regular recommendations in the web-based experiments Therefore they represent a viablealternative retrieval strategy For some domains (egfor multimedia retrieval) it is also possible to usean optional selection field to enable the formulationof cross-domain requests Evaluations for this querytype suggest that users might receive interesting results(ie diversified and comprehensive recommendationlists) from such a requestIn summary the contributions of the paper are as fol-lows

ndash Development and implementation of novel re-trieval strategies in the SKOSRec system whichfacilitate personalized requests and leverage thepotential of knowledge graphs in order to addressthe research gap in state-of-the-art recommenda-tion methods with regard to customized queries

ndash Realization of web-based user experiments infour usage scenarios with the following out-comes

bull Successful demonstration that knowledge graphdata is applicable in the context of live recom-mendation scenarios

bull Creation of suitable query patterns that canbe applied to represent individual informationneeds in different application contexts The setof query patterns can be extended with the helpof the SKOSRec query language to meet therequirements of additional usage scenarios

bull Proof that the novel retrieval strategies can besuccessfully applied across various applicationcontexts and recommendation tasks and thatthey outperform conventional content-basedretrieval approaches especially in the perfor-mance dimensions of diversity and recall

The prototypical implementation and the evaluationof the SKOSRec engine in different usage scenarioshave demonstrated that the freely available knowledgesources on the LOD cloud can be successfully appliedto advance state-of-the-art methods in IR and recom-mender systems The proposed personalized and cus-tomized retrieval strategies can leverage the potentialof knowledge graphs by better matching informationneeds of users with the help of semantic search op-tions

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be com-prised of up to six elements of which the partsSimProjection and ItemPart are obligatory Henceusers need to state at least their preferencesand how many recommendations they wouldlike to receive Advanced retrieval patterns canbe formulated as well For instance similar re-source retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (Se-lectPart) of LOD resources As in SPARQL thequery can start with a prologue that abbreviatesIRIs with namespace declarations [21]

SelectPart The SelectPart part enables subqueryingwith recommendation results The surroundingquery is a SELECT query with a slightly simpli-fied WHERE clause (ie RecWhereClause) withwhich postfilter conditions can be formulatedIn this section users have the option to formu-late aggregation-based retrieval requests (Aggre-gation)

Aggregation This query part handles aggregation-based postfiltering Users specify the IRI (IRIref )resource for which similarity scores of sublevelentities are summarized Additionally it is statedhow the data should be aggregated (ie SUMMAX or AVG) as well as which variable (Var)represents the aggregation-based recommenda-tions in the postfilter section The compiler makessure that this variable is actually contained in theRecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refersto the list of generated suggestions It definesthe recommendation variable to which all simi-lar LOD resources are mapped In case pre- andpostfilter conditions are set in the other sectionsthe compiler checks whether the variable (Var)appears in these sections as well Otherwise therequired join operations cannot be executed Itis specified how many recommendations shouldbe displayed (Integer) and whether the requestwill be issued against a default repository or as across-repository request (ServiceIntegration)

ServiceIntegration This section indicates that a userintends to receive recommendations from a dif-ferent SPARQL endpoint than the default end-point that is stated in the standard configurationof the SKOSRec engine It specifies the targetSPARQL endpoint (IRIref ) from which a user

L Wenige et al Similarity-based Graph Queries 29

wants to receive recommendations Upon extrac-tion of SKOS annotations from the default sourceendpoint a subsequent request is sent to the spec-ified target endpoint

ItemPart This section of a SKOSRec query repre-sents a single preference in the user profile How-ever recommendation queries can contain a cou-ple of preference statements A preference is ei-ther expressed as a LOD resource (IRIref ) or asa variable which is bound to a preference query(VarPart) In the latter case the resources areobtained by matching the graph pattern in theVarPart with the triple statements in the reposi-tory The ItemPart statement can be additionallysupplemented with a concept-to-concept similar-ity threshold (Sim) that triggers concept expan-sion with the approach of flexible similarity de-tection which is extensively described in one ofour previous works [64]

VarPart The construct initiates preference queryingIndividual likings are expressed as a graph pat-tern in a RecWhereClause The variable (Var) isa placeholder for the LOD resources that will beused to set up the user profile The compiler hasto make sure that the specified variable occurs inthe RecWhereClause as well Otherwise the ex-traction of LOD resources cannot be carried outcorrectly

Sim The Sim part of a SKOSRec query denotes thethreshold of concept-to-concept scores for flexi-ble similarity detection Even though knowledge-based similarity values usually range between 0and 1 the specification allows a broader range ofdigits in the Decimal datatype to enable applica-tion of other similarity metrics in case they areneeded

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users canstate whether scores should be ldquolargerrdquo ldquolargerthanrdquo or ldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three dif-ferent sections of a SKOSRec query ie either inthe prefilter (ItemPart) the postfilter (SelectPart)or the preference filter section (VarPart) TheRecWhereClause closely resembles the ldquoWhere-Clauserdquo of the SPARQL 11 specification withall its subsequent parts [21] Hence it enablesdifferent combinations of graph pattern match-ing However the RecWhereClause of the SKOS-Rec language allows fewer pattern matching ex-pressions than the ldquoWhereClauserdquo of the regular

SPARQL syntax specification [21] For instanceit is neither permitted to formulate subqueries norto direct filter requests to more than one reposi-tory (see RecGraphPatternNotTriples for furtherexplanations) This modification has been madeto simplify similarity calculation After parsingthis section the compiler makes sure that recom-mendation preference or postfilter variables oc-cur at least once in the RecWhereClause to guar-antee that subsequent steps of the recommenda-tion workflow can be applied on existing LOD re-sources In case the RecWhereClause comprises aRecMinusGraphPattern it also has to be checkedthat the variable is not only contained in theMINUS part of the group

RecGroupGraphPattern A RecGroupGraphPatterncan contain one to many basic graph patternswhich can be differently combined according toRecGroupGraphPatternSub

RecGroupGraphPatternSub This section defines thegraph patterns that are applied to identify suit-able LOD resources User filters can be either ex-pressed as basic graph patterns (TriplesBlock) oras combinations of them (RecGraphPatternNot-Triples) The sequence of different kinds of pat-terns can be flexibly specified

RecGraphPatternNotTriples This query part closelyresembles the similar named ldquoGraphPatternNot-Triplesrdquo of the SPARQL 11 specification [21]The only difference to SPARQL is that the Rec-GraphPatternNotTriples section is a little morerestricted in terms of admissible graph patternsthan the ldquoGraphPatternNotTriplesrdquo section of theregular SPARQL specification For instance itdoes not allow to retrieve LOD resources otherthan from the default graph that is specified in theconfiguration of the SKOSRec engine since filterpatterns have to be applied to datasets that containSKOS annotations Hence they need to be speci-fied before runtime execution to ensure that sim-ilarity calculation can be carried out correctly Ifa user was able to formulate filter conditions thatreferred to different RDF graphs and endpointsaccordingly this would not be possible

RecGroupOrUnionGraphPattern This query sec-tion marks an alternative graph pattern in the styleof the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntaxspecification [21] However it neither allowsSPARQL-like subqueries nor federated queries

30 L Wenige et al Similarity-based Graph Queries

for filtered resources because this would compli-cate similarity calculation

RecOptionalGraphPattern In this part of a SKOS-RecQuery users are enabled to specify additionalgraph patterns which might extend the query so-lution but do not necessarily have to match thedata

RecMinusGraphPattern This section handles exclu-sion of LOD resources for cases when users wantto omit certain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

L Wenige et al Similarity-based Graph Queries 31

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM A

mobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige J Ruhland Flexible on-the-fly recommendationsfrom Linked Open Data repositories In International Confer-ence on Business Information Systems pp 43ndash54 2016

[63] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[64] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[65] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[66] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[67] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Future Work amp Conclusion
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 27: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge … · Semantic Web 0 (0) 1 1 IOS Press Similarity-based Knowledge Graph Queries for Recommendation Retrieval Lisa Wenige*,

L Wenige et al Similarity-based Graph Queries 27

Table 21Jaccard Indices (TC4)

Domain Mean Jaccard Index (JI) SD N

Movie 003 007 44

Music 001 002 21

Book 016 027 33

of similarity calculation can help to retrieve other rel-evant items

5 Future Work amp Conclusion

This paper has presented the query-based SKOS-Rec engine that facilitates new types of recommen-dation requests for LOD repositories The evaluationshave shown that the SKOSRec system can often gener-ate relevant and useful suggestions when certain querytemplates are utilized In many cases the applicationof expressive or advanced query patterns (ie in thecontext of constraint-based retrieval or cross-domainrequests) helped to significantly improve recall valueswhile still providing the same level of quality in theremaining performance dimensions Additionally ad-vanced recommendation requests can be used to gener-ate diversified recommendation lists in case users arenot satisfied with the results of regular recommenda-tion requests (ie roll-up or cross-domain retrieval vsregular requests) While not being superior in each do-main and test case is at least safe to say that the novelrecommendation approaches of the SKOSRec engineconsiderably extend common retrieval methods and atleast provide an alternative search strategy when con-ventional methods fail to provide useful results Addi-tionally the increase in diversity and recall is in linewith findings from previous research on LOD-enabledRS [45]It is a strength of the developed approaches that theyfacilitate combinations of graph-based and similarity-based retrieval at different stages of the recommenda-tion workflow This feature extends general search ca-pabilities for semantic networks It requires future re-search to investigate whether in addition to SKOS fur-ther RDF vocabularies can be used for ad-hoc item-to-item similarity computation to increase the range ofpotential usage scenarios since not every LOD datasetcontains SKOS annotations [53] The application ofadditional properties could also boost the recommen-dation quality of some query patterns (eg such as thequery types used in TC3) In this context it also needsto be determined whether the representation of the user

profile in the SKOSRec query language can be substi-tuted with free-text expressions to enable search-likefunctionalities LOD researchers have already devel-oped engines that can process natural language queriesover RDF data [59] [34] [67] It will have to be inves-tigated how the SKOSRec engine profitably fits intothis landscape of existing retrieval toolsAnother open research question concerns the aspect ofinterfaces to other applications In its standard config-uration the system is not an end-user retrieval enginebut a backend application with which an administra-tor who has domain knowledge (eg of RDF vocab-ularies and data models of a particular LOD reposi-tory and usage scenario) creates the respective querypatterns Although the websites of the online experi-ments represent possible implementations of suitableend-user applications for the SKOSRec engine thereare still many other possibilities and extensions to de-sign such a user interface (UI) For instance the selec-tion of LOD items for the profile was made possibleby previously storing the data in an index that couldthen be accessed and searched before issuing a recom-mendation query However this runs against the ideaof ad-hoc retrieval In this context future research willhave to determine whether it is feasible to generate anon-the-fly mapping from items to the respective LODresources In the case of a commercial application theengine needs to match items with the correspondingentries in the product catalog Even if it is not yet clearwhich UI components can make the best possible useof the SKOSRec enginersquos features the results from theuser experiments can give at least a few suggestionsThe following retrieval options should be the defaultsetting in an interface that contains a LOD-enabledRS component to assist users in finding interestingitems Since the regular approach of on-the-fly re-trieval achieved good results throughout the domainsthe display of simple recommendations generated frompreviously stated user preferences represents a suitablestarting point for retrieval On-the-fly queries could berefined by providing filter options Since the evalua-tions have shown that expressive constraints often leadto increased recall values the application of such afilter might be the best option for assisted retrievalAs in the preparations for the experiments appro-priate graph-based query patterns for the usage sce-nario in question would also have to be determinedby a domain expert before setting up the applicationThe same applies to advanced retrieval patterns whichshould be available to the end user to refine the querywhen both the simple as well as the constraint-based

28 L Wenige et al Similarity-based Graph Queries

approach have failed to provide relevant items Thissuggestion is made because roll-up queries were com-petitive with regular recommendations in the web-based experiments Therefore they represent a viablealternative retrieval strategy For some domains (egfor multimedia retrieval) it is also possible to usean optional selection field to enable the formulationof cross-domain requests Evaluations for this querytype suggest that users might receive interesting results(ie diversified and comprehensive recommendationlists) from such a requestIn summary the contributions of the paper are as fol-lows

ndash Development and implementation of novel re-trieval strategies in the SKOSRec system whichfacilitate personalized requests and leverage thepotential of knowledge graphs in order to addressthe research gap in state-of-the-art recommenda-tion methods with regard to customized queries

ndash Realization of web-based user experiments infour usage scenarios with the following out-comes

bull Successful demonstration that knowledge graphdata is applicable in the context of live recom-mendation scenarios

bull Creation of suitable query patterns that canbe applied to represent individual informationneeds in different application contexts The setof query patterns can be extended with the helpof the SKOSRec query language to meet therequirements of additional usage scenarios

bull Proof that the novel retrieval strategies can besuccessfully applied across various applicationcontexts and recommendation tasks and thatthey outperform conventional content-basedretrieval approaches especially in the perfor-mance dimensions of diversity and recall

The prototypical implementation and the evaluationof the SKOSRec engine in different usage scenarioshave demonstrated that the freely available knowledgesources on the LOD cloud can be successfully appliedto advance state-of-the-art methods in IR and recom-mender systems The proposed personalized and cus-tomized retrieval strategies can leverage the potentialof knowledge graphs by better matching informationneeds of users with the help of semantic search op-tions

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be com-prised of up to six elements of which the partsSimProjection and ItemPart are obligatory Henceusers need to state at least their preferencesand how many recommendations they wouldlike to receive Advanced retrieval patterns canbe formulated as well For instance similar re-source retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (Se-lectPart) of LOD resources As in SPARQL thequery can start with a prologue that abbreviatesIRIs with namespace declarations [21]

SelectPart The SelectPart part enables subqueryingwith recommendation results The surroundingquery is a SELECT query with a slightly simpli-fied WHERE clause (ie RecWhereClause) withwhich postfilter conditions can be formulatedIn this section users have the option to formu-late aggregation-based retrieval requests (Aggre-gation)

Aggregation This query part handles aggregation-based postfiltering Users specify the IRI (IRIref )resource for which similarity scores of sublevelentities are summarized Additionally it is statedhow the data should be aggregated (ie SUMMAX or AVG) as well as which variable (Var)represents the aggregation-based recommenda-tions in the postfilter section The compiler makessure that this variable is actually contained in theRecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refersto the list of generated suggestions It definesthe recommendation variable to which all simi-lar LOD resources are mapped In case pre- andpostfilter conditions are set in the other sectionsthe compiler checks whether the variable (Var)appears in these sections as well Otherwise therequired join operations cannot be executed Itis specified how many recommendations shouldbe displayed (Integer) and whether the requestwill be issued against a default repository or as across-repository request (ServiceIntegration)

ServiceIntegration This section indicates that a userintends to receive recommendations from a dif-ferent SPARQL endpoint than the default end-point that is stated in the standard configurationof the SKOSRec engine It specifies the targetSPARQL endpoint (IRIref ) from which a user

L Wenige et al Similarity-based Graph Queries 29

wants to receive recommendations Upon extrac-tion of SKOS annotations from the default sourceendpoint a subsequent request is sent to the spec-ified target endpoint

ItemPart This section of a SKOSRec query repre-sents a single preference in the user profile How-ever recommendation queries can contain a cou-ple of preference statements A preference is ei-ther expressed as a LOD resource (IRIref ) or asa variable which is bound to a preference query(VarPart) In the latter case the resources areobtained by matching the graph pattern in theVarPart with the triple statements in the reposi-tory The ItemPart statement can be additionallysupplemented with a concept-to-concept similar-ity threshold (Sim) that triggers concept expan-sion with the approach of flexible similarity de-tection which is extensively described in one ofour previous works [64]

VarPart The construct initiates preference queryingIndividual likings are expressed as a graph pat-tern in a RecWhereClause The variable (Var) isa placeholder for the LOD resources that will beused to set up the user profile The compiler hasto make sure that the specified variable occurs inthe RecWhereClause as well Otherwise the ex-traction of LOD resources cannot be carried outcorrectly

Sim The Sim part of a SKOSRec query denotes thethreshold of concept-to-concept scores for flexi-ble similarity detection Even though knowledge-based similarity values usually range between 0and 1 the specification allows a broader range ofdigits in the Decimal datatype to enable applica-tion of other similarity metrics in case they areneeded

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users canstate whether scores should be ldquolargerrdquo ldquolargerthanrdquo or ldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three dif-ferent sections of a SKOSRec query ie either inthe prefilter (ItemPart) the postfilter (SelectPart)or the preference filter section (VarPart) TheRecWhereClause closely resembles the ldquoWhere-Clauserdquo of the SPARQL 11 specification withall its subsequent parts [21] Hence it enablesdifferent combinations of graph pattern match-ing However the RecWhereClause of the SKOS-Rec language allows fewer pattern matching ex-pressions than the ldquoWhereClauserdquo of the regular

SPARQL syntax specification [21] For instanceit is neither permitted to formulate subqueries norto direct filter requests to more than one reposi-tory (see RecGraphPatternNotTriples for furtherexplanations) This modification has been madeto simplify similarity calculation After parsingthis section the compiler makes sure that recom-mendation preference or postfilter variables oc-cur at least once in the RecWhereClause to guar-antee that subsequent steps of the recommenda-tion workflow can be applied on existing LOD re-sources In case the RecWhereClause comprises aRecMinusGraphPattern it also has to be checkedthat the variable is not only contained in theMINUS part of the group

RecGroupGraphPattern A RecGroupGraphPatterncan contain one to many basic graph patternswhich can be differently combined according toRecGroupGraphPatternSub

RecGroupGraphPatternSub This section defines thegraph patterns that are applied to identify suit-able LOD resources User filters can be either ex-pressed as basic graph patterns (TriplesBlock) oras combinations of them (RecGraphPatternNot-Triples) The sequence of different kinds of pat-terns can be flexibly specified

RecGraphPatternNotTriples This query part closelyresembles the similar named ldquoGraphPatternNot-Triplesrdquo of the SPARQL 11 specification [21]The only difference to SPARQL is that the Rec-GraphPatternNotTriples section is a little morerestricted in terms of admissible graph patternsthan the ldquoGraphPatternNotTriplesrdquo section of theregular SPARQL specification For instance itdoes not allow to retrieve LOD resources otherthan from the default graph that is specified in theconfiguration of the SKOSRec engine since filterpatterns have to be applied to datasets that containSKOS annotations Hence they need to be speci-fied before runtime execution to ensure that sim-ilarity calculation can be carried out correctly Ifa user was able to formulate filter conditions thatreferred to different RDF graphs and endpointsaccordingly this would not be possible

RecGroupOrUnionGraphPattern This query sec-tion marks an alternative graph pattern in the styleof the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntaxspecification [21] However it neither allowsSPARQL-like subqueries nor federated queries

30 L Wenige et al Similarity-based Graph Queries

for filtered resources because this would compli-cate similarity calculation

RecOptionalGraphPattern In this part of a SKOS-RecQuery users are enabled to specify additionalgraph patterns which might extend the query so-lution but do not necessarily have to match thedata

RecMinusGraphPattern This section handles exclu-sion of LOD resources for cases when users wantto omit certain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

L Wenige et al Similarity-based Graph Queries 31

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM A

mobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige J Ruhland Flexible on-the-fly recommendationsfrom Linked Open Data repositories In International Confer-ence on Business Information Systems pp 43ndash54 2016

[63] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[64] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[65] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[66] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[67] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Future Work amp Conclusion
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 28: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge … · Semantic Web 0 (0) 1 1 IOS Press Similarity-based Knowledge Graph Queries for Recommendation Retrieval Lisa Wenige*,

28 L Wenige et al Similarity-based Graph Queries

approach have failed to provide relevant items Thissuggestion is made because roll-up queries were com-petitive with regular recommendations in the web-based experiments Therefore they represent a viablealternative retrieval strategy For some domains (egfor multimedia retrieval) it is also possible to usean optional selection field to enable the formulationof cross-domain requests Evaluations for this querytype suggest that users might receive interesting results(ie diversified and comprehensive recommendationlists) from such a requestIn summary the contributions of the paper are as fol-lows

ndash Development and implementation of novel re-trieval strategies in the SKOSRec system whichfacilitate personalized requests and leverage thepotential of knowledge graphs in order to addressthe research gap in state-of-the-art recommenda-tion methods with regard to customized queries

ndash Realization of web-based user experiments infour usage scenarios with the following out-comes

bull Successful demonstration that knowledge graphdata is applicable in the context of live recom-mendation scenarios

bull Creation of suitable query patterns that canbe applied to represent individual informationneeds in different application contexts The setof query patterns can be extended with the helpof the SKOSRec query language to meet therequirements of additional usage scenarios

bull Proof that the novel retrieval strategies can besuccessfully applied across various applicationcontexts and recommendation tasks and thatthey outperform conventional content-basedretrieval approaches especially in the perfor-mance dimensions of diversity and recall

The prototypical implementation and the evaluationof the SKOSRec engine in different usage scenarioshave demonstrated that the freely available knowledgesources on the LOD cloud can be successfully appliedto advance state-of-the-art methods in IR and recom-mender systems The proposed personalized and cus-tomized retrieval strategies can leverage the potentialof knowledge graphs by better matching informationneeds of users with the help of semantic search op-tions

Appendix A The SKOSRecommender QuerySyntax

SKOSRecQuery A SKOSRec query can be com-prised of up to six elements of which the partsSimProjection and ItemPart are obligatory Henceusers need to state at least their preferencesand how many recommendations they wouldlike to receive Advanced retrieval patterns canbe formulated as well For instance similar re-source retrieval can be combined with prefilter-ing (RecWhereClause) andor postfiltering (Se-lectPart) of LOD resources As in SPARQL thequery can start with a prologue that abbreviatesIRIs with namespace declarations [21]

SelectPart The SelectPart part enables subqueryingwith recommendation results The surroundingquery is a SELECT query with a slightly simpli-fied WHERE clause (ie RecWhereClause) withwhich postfilter conditions can be formulatedIn this section users have the option to formu-late aggregation-based retrieval requests (Aggre-gation)

Aggregation This query part handles aggregation-based postfiltering Users specify the IRI (IRIref )resource for which similarity scores of sublevelentities are summarized Additionally it is statedhow the data should be aggregated (ie SUMMAX or AVG) as well as which variable (Var)represents the aggregation-based recommenda-tions in the postfilter section The compiler makessure that this variable is actually contained in theRecWhereClause of the SelectPart

SimProjection This part of a SKOSRec query refersto the list of generated suggestions It definesthe recommendation variable to which all simi-lar LOD resources are mapped In case pre- andpostfilter conditions are set in the other sectionsthe compiler checks whether the variable (Var)appears in these sections as well Otherwise therequired join operations cannot be executed Itis specified how many recommendations shouldbe displayed (Integer) and whether the requestwill be issued against a default repository or as across-repository request (ServiceIntegration)

ServiceIntegration This section indicates that a userintends to receive recommendations from a dif-ferent SPARQL endpoint than the default end-point that is stated in the standard configurationof the SKOSRec engine It specifies the targetSPARQL endpoint (IRIref ) from which a user

L Wenige et al Similarity-based Graph Queries 29

wants to receive recommendations Upon extrac-tion of SKOS annotations from the default sourceendpoint a subsequent request is sent to the spec-ified target endpoint

ItemPart This section of a SKOSRec query repre-sents a single preference in the user profile How-ever recommendation queries can contain a cou-ple of preference statements A preference is ei-ther expressed as a LOD resource (IRIref ) or asa variable which is bound to a preference query(VarPart) In the latter case the resources areobtained by matching the graph pattern in theVarPart with the triple statements in the reposi-tory The ItemPart statement can be additionallysupplemented with a concept-to-concept similar-ity threshold (Sim) that triggers concept expan-sion with the approach of flexible similarity de-tection which is extensively described in one ofour previous works [64]

VarPart The construct initiates preference queryingIndividual likings are expressed as a graph pat-tern in a RecWhereClause The variable (Var) isa placeholder for the LOD resources that will beused to set up the user profile The compiler hasto make sure that the specified variable occurs inthe RecWhereClause as well Otherwise the ex-traction of LOD resources cannot be carried outcorrectly

Sim The Sim part of a SKOSRec query denotes thethreshold of concept-to-concept scores for flexi-ble similarity detection Even though knowledge-based similarity values usually range between 0and 1 the specification allows a broader range ofdigits in the Decimal datatype to enable applica-tion of other similarity metrics in case they areneeded

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users canstate whether scores should be ldquolargerrdquo ldquolargerthanrdquo or ldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three dif-ferent sections of a SKOSRec query ie either inthe prefilter (ItemPart) the postfilter (SelectPart)or the preference filter section (VarPart) TheRecWhereClause closely resembles the ldquoWhere-Clauserdquo of the SPARQL 11 specification withall its subsequent parts [21] Hence it enablesdifferent combinations of graph pattern match-ing However the RecWhereClause of the SKOS-Rec language allows fewer pattern matching ex-pressions than the ldquoWhereClauserdquo of the regular

SPARQL syntax specification [21] For instanceit is neither permitted to formulate subqueries norto direct filter requests to more than one reposi-tory (see RecGraphPatternNotTriples for furtherexplanations) This modification has been madeto simplify similarity calculation After parsingthis section the compiler makes sure that recom-mendation preference or postfilter variables oc-cur at least once in the RecWhereClause to guar-antee that subsequent steps of the recommenda-tion workflow can be applied on existing LOD re-sources In case the RecWhereClause comprises aRecMinusGraphPattern it also has to be checkedthat the variable is not only contained in theMINUS part of the group

RecGroupGraphPattern A RecGroupGraphPatterncan contain one to many basic graph patternswhich can be differently combined according toRecGroupGraphPatternSub

RecGroupGraphPatternSub This section defines thegraph patterns that are applied to identify suit-able LOD resources User filters can be either ex-pressed as basic graph patterns (TriplesBlock) oras combinations of them (RecGraphPatternNot-Triples) The sequence of different kinds of pat-terns can be flexibly specified

RecGraphPatternNotTriples This query part closelyresembles the similar named ldquoGraphPatternNot-Triplesrdquo of the SPARQL 11 specification [21]The only difference to SPARQL is that the Rec-GraphPatternNotTriples section is a little morerestricted in terms of admissible graph patternsthan the ldquoGraphPatternNotTriplesrdquo section of theregular SPARQL specification For instance itdoes not allow to retrieve LOD resources otherthan from the default graph that is specified in theconfiguration of the SKOSRec engine since filterpatterns have to be applied to datasets that containSKOS annotations Hence they need to be speci-fied before runtime execution to ensure that sim-ilarity calculation can be carried out correctly Ifa user was able to formulate filter conditions thatreferred to different RDF graphs and endpointsaccordingly this would not be possible

RecGroupOrUnionGraphPattern This query sec-tion marks an alternative graph pattern in the styleof the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntaxspecification [21] However it neither allowsSPARQL-like subqueries nor federated queries

30 L Wenige et al Similarity-based Graph Queries

for filtered resources because this would compli-cate similarity calculation

RecOptionalGraphPattern In this part of a SKOS-RecQuery users are enabled to specify additionalgraph patterns which might extend the query so-lution but do not necessarily have to match thedata

RecMinusGraphPattern This section handles exclu-sion of LOD resources for cases when users wantto omit certain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

L Wenige et al Similarity-based Graph Queries 31

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM A

mobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige J Ruhland Flexible on-the-fly recommendationsfrom Linked Open Data repositories In International Confer-ence on Business Information Systems pp 43ndash54 2016

[63] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[64] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[65] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[66] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[67] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Future Work amp Conclusion
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 29: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge … · Semantic Web 0 (0) 1 1 IOS Press Similarity-based Knowledge Graph Queries for Recommendation Retrieval Lisa Wenige*,

L Wenige et al Similarity-based Graph Queries 29

wants to receive recommendations Upon extrac-tion of SKOS annotations from the default sourceendpoint a subsequent request is sent to the spec-ified target endpoint

ItemPart This section of a SKOSRec query repre-sents a single preference in the user profile How-ever recommendation queries can contain a cou-ple of preference statements A preference is ei-ther expressed as a LOD resource (IRIref ) or asa variable which is bound to a preference query(VarPart) In the latter case the resources areobtained by matching the graph pattern in theVarPart with the triple statements in the reposi-tory The ItemPart statement can be additionallysupplemented with a concept-to-concept similar-ity threshold (Sim) that triggers concept expan-sion with the approach of flexible similarity de-tection which is extensively described in one ofour previous works [64]

VarPart The construct initiates preference queryingIndividual likings are expressed as a graph pat-tern in a RecWhereClause The variable (Var) isa placeholder for the LOD resources that will beused to set up the user profile The compiler hasto make sure that the specified variable occurs inthe RecWhereClause as well Otherwise the ex-traction of LOD resources cannot be carried outcorrectly

Sim The Sim part of a SKOSRec query denotes thethreshold of concept-to-concept scores for flexi-ble similarity detection Even though knowledge-based similarity values usually range between 0and 1 the specification allows a broader range ofdigits in the Decimal datatype to enable applica-tion of other similarity metrics in case they areneeded

Relation This section specifies how concept-to-conceptsimilarity scores should be determined Users canstate whether scores should be ldquolargerrdquo ldquolargerthanrdquo or ldquoequal tordquo the threshold value

RecWhereClause This clause can occur in three dif-ferent sections of a SKOSRec query ie either inthe prefilter (ItemPart) the postfilter (SelectPart)or the preference filter section (VarPart) TheRecWhereClause closely resembles the ldquoWhere-Clauserdquo of the SPARQL 11 specification withall its subsequent parts [21] Hence it enablesdifferent combinations of graph pattern match-ing However the RecWhereClause of the SKOS-Rec language allows fewer pattern matching ex-pressions than the ldquoWhereClauserdquo of the regular

SPARQL syntax specification [21] For instanceit is neither permitted to formulate subqueries norto direct filter requests to more than one reposi-tory (see RecGraphPatternNotTriples for furtherexplanations) This modification has been madeto simplify similarity calculation After parsingthis section the compiler makes sure that recom-mendation preference or postfilter variables oc-cur at least once in the RecWhereClause to guar-antee that subsequent steps of the recommenda-tion workflow can be applied on existing LOD re-sources In case the RecWhereClause comprises aRecMinusGraphPattern it also has to be checkedthat the variable is not only contained in theMINUS part of the group

RecGroupGraphPattern A RecGroupGraphPatterncan contain one to many basic graph patternswhich can be differently combined according toRecGroupGraphPatternSub

RecGroupGraphPatternSub This section defines thegraph patterns that are applied to identify suit-able LOD resources User filters can be either ex-pressed as basic graph patterns (TriplesBlock) oras combinations of them (RecGraphPatternNot-Triples) The sequence of different kinds of pat-terns can be flexibly specified

RecGraphPatternNotTriples This query part closelyresembles the similar named ldquoGraphPatternNot-Triplesrdquo of the SPARQL 11 specification [21]The only difference to SPARQL is that the Rec-GraphPatternNotTriples section is a little morerestricted in terms of admissible graph patternsthan the ldquoGraphPatternNotTriplesrdquo section of theregular SPARQL specification For instance itdoes not allow to retrieve LOD resources otherthan from the default graph that is specified in theconfiguration of the SKOSRec engine since filterpatterns have to be applied to datasets that containSKOS annotations Hence they need to be speci-fied before runtime execution to ensure that sim-ilarity calculation can be carried out correctly Ifa user was able to formulate filter conditions thatreferred to different RDF graphs and endpointsaccordingly this would not be possible

RecGroupOrUnionGraphPattern This query sec-tion marks an alternative graph pattern in the styleof the ldquoGroup-OrUnionGraphPatternrdquo of the SPARQL syntaxspecification [21] However it neither allowsSPARQL-like subqueries nor federated queries

30 L Wenige et al Similarity-based Graph Queries

for filtered resources because this would compli-cate similarity calculation

RecOptionalGraphPattern In this part of a SKOS-RecQuery users are enabled to specify additionalgraph patterns which might extend the query so-lution but do not necessarily have to match thedata

RecMinusGraphPattern This section handles exclu-sion of LOD resources for cases when users wantto omit certain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

L Wenige et al Similarity-based Graph Queries 31

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM A

mobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige J Ruhland Flexible on-the-fly recommendationsfrom Linked Open Data repositories In International Confer-ence on Business Information Systems pp 43ndash54 2016

[63] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[64] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[65] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[66] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[67] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Future Work amp Conclusion
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 30: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge … · Semantic Web 0 (0) 1 1 IOS Press Similarity-based Knowledge Graph Queries for Recommendation Retrieval Lisa Wenige*,

30 L Wenige et al Similarity-based Graph Queries

for filtered resources because this would compli-cate similarity calculation

RecOptionalGraphPattern In this part of a SKOS-RecQuery users are enabled to specify additionalgraph patterns which might extend the query so-lution but do not necessarily have to match thedata

RecMinusGraphPattern This section handles exclu-sion of LOD resources for cases when users wantto omit certain triple statements from the solution

References

[1] G Adomavicius A Tuzhilin Toward the next generation ofrecommender systems A survey of the state-of-the-art and pos-sible extensions In IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[2] G Adomavicius A Tuzhilin R Zheng REQUEST A querylanguage for customizing recommendations In InformationSystems Research vol 22 no 1 pp 99ndash117 2011

[3] C Aggarwal Recommender systems Springer 2016[4] M Arenas B Cuenca Grau E Kharlamov S Marciuska D

Zheleznyakov Faceted search over ontology-enhanced RDFdata In Proceedings of the 23rd ACM International Confer-ence on Information and Knowledge Management pp 939ndash9482014

[5] V Ayala M Przyjaciel-Zablocki T Hornung A Schaumltzle GLausen Extending SPARQL for recommendations In Seman-tic Web Information Management 2004

[6] J Beel C Breitinger S Langer A Lommatzsch B GippTowards reproducibility in recommender systems research InUser Modeling and User-adapted Interaction vol 26 no 1 pp69ndash101 2016

[7] C Boutilier R Brafman C Domshlak H Hoos D Poole CP-nets A tool for representing and reasoning with conditional ce-teris paribus preference statements In Journal of Artificial In-telligence Research (JAIR) vol 21 pp 135ndash191 2004

[8] P Castells S Vargas J Wang Novelty and diversity metrics forrecommender systems choice discovery and relevance 2011

[9] G Cheng W Ge Y Qu Falcons searching and browsing en-tities on the Semantic Web In Proceedings of the 17th ACMInternational Conference on World Wide Web pp 1101ndash11022008

[10] J Davies R Weeks QuizRDF Search technology for the Se-mantic Web In Proceedings of the 37th Annual Hawaii Inter-national Conference on System Sciences 2004

[11] DBpedia URL httpswikidbpediaorg 2017[12] T Di Noia R Mirizz V Ostuni D Romito Exploiting the

web of data in model-based recommender systems In Proceed-ings of the 6th ACM Conference on Recommender Systems (Rec-Sys) pp 253ndash256 2012

[13] T Di Noia R Mirizzi V Ostuni D Romito M ZankerLinked Open Data to support content-based recommender sys-tems In Proceedings of the 8th International Conference onSemantic Systems 2012

[14] D Dillman D Robert D Bowker Principles for construct-ing web surveys In Joint Meetings of the American StatisticalAssociation 1998

[15] DCMI metadata terms URL httpdublincoreorgdocumentsdcmi-terms 2004

[16] I Fernandez-Tobias I Cantador M Kaminskas F RicciA generic semantic-based framework for cross-domain recom-mendation In Proceedings of the 2nd International Workshopon Information Heterogeneity and Fusion in Recommender Sys-tems pp 25ndash32 2011

[17] A Freitas J Oliveira S OrsquoRiain E Curry J Da Silva CPereira Querying Linked Data using semantic relatedness a vo-cabulary independent approach In Natural Language Process-ing and Information Systems pp 40ndash51 2011

[18] T Grainger T Potter Y Seeley Solr in ation Cherry HillManning 2014

[19] A Gunawardana G Shani Evaluating recommender systemsIn Recommender Systems Handbook Springer pp 265ndash3082015

[20] R Hahn C Bizer C Sahnwaldt C Herta S Robinson MBuumlrgle H Duumlwiger U Scheel Faceted Wikipedia search InInternational Conference on Business Information Systems pp1ndash11 2010

[21] S Harris A Seaborne E Prudrsquohommeaux SPARQL 11query language In W3C recommendation 2013

[22] B Heitmann C Hayes Using Linked Data to build open col-laborative recommender systems In AAAI Spring SymposiumLinked Data Meets Artificial Intelligence pp 76ndash81 2010

[23] J Herlocker J Konstan L Terveen J Riedl Evaluating col-laborative filtering recommender systems In ACM Transac-tions on Information Systems (TOIS) vol 22 no 1 pp 5ndash532004

[24] Y Hu Z Wang W Wu J Guo M Zhang Recommendationfor movies and stars using YAGO and IMDB In 12th Inter-national Asia-Pacific Web Conference (APWEB) pp 123ndash1292010

[25] D Huynh D Karger Parallax and companion Set-basedbrowsing for the data web In Proceedings of the 18th ACM In-ternational Conference on World Wide Web 2009

[26] D Jannach M Zanker A Felfernig G Friedrich Recom-mender systems an introduction Cambridge University Press2010

[27] Y Kabutoya R Sumi T Iwata T Uchiyama T Uchiyama InIEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology (WI-IAT) pp 625ndash630 2012

[28] M Kaminskas I Fernandez-Tobias F Ricci I CantadorKnowledge-based music retrieval for places of interest In Pro-ceedings of the 2nd International ACM Workshop on Music In-formation Retrieval with User-centered and Multimodal Strate-gies pp 19ndash24 2012

[29] H Khrouf R Troncy Hybrid event recommendation usingLinked Data and user diversity In Proceedings of the 7th ACMConference on Recommender Systems pp 185ndash192 2010

[30] P Kirk Experimental design John Wiley amp Sons 1982[31] B Knijnenburg M Willemsen Z Gantner H Soncu C

Newell Explaining the user experience of recommender sys-tems In User Modeling and User-Adapted Interaction vol 22no 4-5 pp 441ndash504 2012

[32] G Koutrika B Bercovitz H Garcia-Molina FlexRecs ex-pressing and combining flexible recommendations Proceedingsof the 2009 ACM SIGMOD International Conference on Man-agement of Data 2009

L Wenige et al Similarity-based Graph Queries 31

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM A

mobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige J Ruhland Flexible on-the-fly recommendationsfrom Linked Open Data repositories In International Confer-ence on Business Information Systems pp 43ndash54 2016

[63] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[64] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[65] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[66] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[67] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Future Work amp Conclusion
          • Appendix A The SKOSRecommender Query Syntax
          • References
Page 31: Semantic Web 0 (0) 1 IOS Press Similarity-based Knowledge … · Semantic Web 0 (0) 1 1 IOS Press Similarity-based Knowledge Graph Queries for Recommendation Retrieval Lisa Wenige*,

L Wenige et al Similarity-based Graph Queries 31

[33] M Lee W Kim Semantic association search and rank methodbased on spreading activation for the Semantic Web In IEEEInternational Conference on Industrial Engineering and Engi-neering Management pp 1523ndash1527 2009

[34] J Lehmann L Buumlhmann Autosparql Let users query yourknowledge base In 8th Extended Semantic Web Conference(ESWC) 2011

[35] N Marie F Gandon M Ribiere F Rodio Discovery hub on-the-fly Linked Data exploratory search In Proceedings of the9th International Conference on Semantic Systems pp 17ndash242013

[36] S McNee J Riedl J Konstan Being accurate is not enoughhow accuracy metrics have hurt recommender systems InCHIrsquo06 Extended Abstracts on Human Factors in ComputingSystems 2006

[37] R Meymandpour J Davis Recommendations using LinkedData In Proceedings of the 5th PhD Workshop on Informationand Knowledge 2012

[38] A Miles S Bechhofer SKOS Simple Knowledge Organiza-tion System Reference In W3C Recommendation 2009

[39] B Mobasher X Jin Y Zhou Semantically enhanced collab-orative filtering on the web In Web Mining From Web to Se-mantic Web pp 57ndash76 2004

[40] A Musetti A Nuzzolese F Draicchio V Presutti EBlomqvist A Gangemi P Ciancarini Aemoo Exploratorysearch based on knowledge patterns over the Semantic Web InSemantic Web Challenge 2012

[41] T Neumann G Weikum The RDF-3X engine for scalablemanagement of RDF data In The VLDB Journal vol 19 no 1pp 91ndash113 2010

[42] J Peacuterez M Arenas C Gutierrez Semantics and Complexityof SPARQL In 5th International Semantic Web Conferencepp 30ndash43 2006

[43] S Policarpio S Brunk G Tummarello G Implementation ofa SPARQL integrated recommendation engine for Linked Datawith hybrid capabilities In AImWD 2012

[44] Pu P Chen L Hu R A user-centric evaluation frameworkfor recommender systems In Proceedings of the 5th ACM Con-ference on Recommender Systems 2011

[45] S Oramas VC Ostuni T Di Noia X Serra E Di Scias-cio Sound and music recommendation with knowledge graphsACM Transactions on Intelligent Systems and Technology(TIST) 8(2) 21 (2017)

[46] E Oren R Delbru S Decker Extending faceted navigationfor RDF data In 5th International Semantic Web Conferencepp 559ndash572 2006

[47] G Pirro Explaining and suggesting relatedness in knowledgegraphs In International Semantic Web Conference pp 622ndash639 2015

[48] G Rizzo R Troncy NERD A framework for evaluatingnamed entity recognition tools in the Web of data 10th Inter-national Semantic Web Conference (ISWCrsquo11) Demo SessionBonn Germany 2011

[49] J Rosati T Di Noia T Lukasiewicz R De Leone A Mau-rino Preference queries with Ceteris Paribus semantics forLinked Data In OTM Confederated International Conferencespp 423ndash442 2015

[50] T Ruotsalo K Haav A Stoyanov S Roche E Fani R DeliaiE Maumlkelauml T Kauppinen E Hyvoumlnen SMARTMUSEUM A

mobile recommender system for the Web of Data In Web Se-mantics Science Services and Agents on the World Wide Webvol 20 pp 50ndash67 2013

[51] I Ruthven M Lalmas K van Rijsbergen Incorporating usersearch behavior into relevance feedback In Journal of the As-sociation for Information Science and Technology vol 54 no6 pp 529ndash549 2003

[52] B Sarwar G Karypis J Konstan J Riedl J Item-based col-laborative filtering recommendation algorithms In Proceedingsof the 10th International Conference on World Wide Web 2001

[53] M Schmachtenberg C Bizer H Paulheim Adoption of theLinked Data best practices in different topical domains In In-ternational Semantic Web Conference (ISWC) pp 245ndash2602014

[54] H Schuumltze C Manning P Raghavan Introduction to informa-tion retrieval Cambridge University Press 2008

[55] SparqlImplementations URL httpswwww3orgwikiSparqlImplementations 2016

[56] M Stankovic W Breitfuss P Laublet Discovering rel-evant topics using DBPedia In Proceedings of the 2011IEEEWICACM International Conferences on Web Intelligenceand Intelligent Agent Technology pp 219ndash222 2011

[57] K Swearingen R Sinha Beyond algorithms An HCI perspec-tive on recommender systems In ACM SIGIR 2001 Workshopon Recommender Systems vol 13 no 5-6 2001

[58] D Tunkelang Faceted search In Synthesis Lectures on Infor-mation Concepts Retrieval and Services pp 1ndash80 2009

[59] C Unger L Buumlhmann J Lehmann A Ngonga Ngomo DGerber P Cimiano Template-based question answering overRDF data In Proceedings of the 21st ACM International Con-ference on World Wide Web 2012

[60] C Varnhagen M Gushta J Daniels T Peters N Parmar DLaw T Johnson How informed is online informed consent InEthics amp Behavior vol 15 no 1 pp 37ndash48 2005

[61] J Waitelonis H Sack H Towards exploratory video searchusing Linked Data In 11th IEEE International Symposium onMultimedia pp 540ndash545 2009

[62] L Wenige J Ruhland Flexible on-the-fly recommendationsfrom Linked Open Data repositories In International Confer-ence on Business Information Systems pp 43ndash54 2016

[63] L Wenige Knowledge Organization Systems for Linked Data-enabled on-the-fly recommendations

[64] L Wenige J Ruhland Retrieval by recommendation usingLOD technologies to improve digital library search In Interna-tional Journal on Digital Libraries 2017

[65] F Zarrinkalam M Kahani A multi-criteria hybrid citation rec-ommendation system based on Linked Data In 2nd Interna-tional EConference on Computer and Knowledge Engineering(ICCKE) pp 283ndash288 2012

[66] C Ziegler S McNee J Konstan G Lausen Improving rec-ommendation lists through topic diversification In Proceedingsof the 14th International Conference on World Wide Web pp22ndash32 2005

[67] L Zou R Huang H Wang J Yu W He D Zhao Natu-ral language question answering over RDF a graph data drivenapproach In Proceedings of the 2014 ACM SIGMOD Interna-tional Conference on Management of Data pp 313ndash324 2014

  • Introduction
  • Related Work
  • The SKOS Recommender
    • System Overview
    • On-the-Fly Recommendations
    • Preference Querying
    • Prefiltering
    • Postfiltering amp Combinations
      • Evaluation of the SKOS Recommender
        • Experimental Setup
        • Results
          • Future Work amp Conclusion
          • Appendix A The SKOSRecommender Query Syntax
          • References