authority control of people and organizations on the ...seco.cs.aalto.fi › publications › 2009...

15
Authority Control of People and Organizations on the Semantic Web Jussi Kurki and Eero Hyv¨ onen Semantic Computing Research Group (SeCo) Helsinki University of Technology (TKK) and University of Helsinki http://www.seco.tkk.fi/ firstname.lastname@tkk.fi Abstract. Authors and documents with identical titles are common in the digital library environment. In order to manage identities correctly, authority control is used by library and information scientists for disam- biguating and cross-referencing entity names. We argue that the benefits of traditional authority control can be enhanced by using techniques and technologies of the Semantic Web, leading to simpler management of multiple languages, better linkability of resources, simpler reuse of au- thority registries in applications, and less work in indexing. To demon- strate our propositions, we have created a prototype of an ontology server and service called ONKI People that is used in two ways: First, it is a centralized authority service providing human end-users with efficient and easy to use authority finding and disambiguation services based on faceted semantic search and visualizations. The services are available on- line also as AJAX and Web Services API for machines to use. Second, the underlying RDF triple store can be used as a content resource in ap- plications such as semantic cultural heritage portals. The paper discusses and demonstrates both use cases in a real life setting. 1 Towards Authority Control on the Semantic Web Authority control includes the processes of maintaining author, title, and subject headings for bibliographic material (person, group or organization) in a library catalog [1, 2]. The basic problems addressed here are: 1) How how to encode names referring to the same entity in a systematic way, so that the resources can later be found by searching (e.g., names can be transliterated differently in different languages)? 2) How to guarantee that different entities are not encoded by a similar name, which would lead to confusions in information retrieval (e.g., different intances of “John Smith” encoded with the same string)? Current methodology [1] relies on rigid syntax and rules for presenting the information. The basic sets of rules define the forms of the names (e.g. use the form Surname, First name for persons’ names), rules for changing the records when the names change, and so on. The main goal is to enable efficient search and retrieval based on author, title and subject names, that are often ambiguous, encoded in varying ways, and are subject to change.

Upload: others

Post on 03-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Authority Control of People and Organizationson the Semantic Web

Jussi Kurki and Eero Hyvonen

Semantic Computing Research Group (SeCo)Helsinki University of Technology (TKK) and University of Helsinki

http://www.seco.tkk.fi/[email protected]

Abstract. Authors and documents with identical titles are common inthe digital library environment. In order to manage identities correctly,authority control is used by library and information scientists for disam-biguating and cross-referencing entity names. We argue that the benefitsof traditional authority control can be enhanced by using techniquesand technologies of the Semantic Web, leading to simpler managementof multiple languages, better linkability of resources, simpler reuse of au-thority registries in applications, and less work in indexing. To demon-strate our propositions, we have created a prototype of an ontology serverand service called ONKI People that is used in two ways: First, it isa centralized authority service providing human end-users with efficientand easy to use authority finding and disambiguation services based onfaceted semantic search and visualizations. The services are available on-line also as AJAX and Web Services API for machines to use. Second,the underlying RDF triple store can be used as a content resource in ap-plications such as semantic cultural heritage portals. The paper discussesand demonstrates both use cases in a real life setting.

1 Towards Authority Control on the Semantic Web

Authority control includes the processes of maintaining author, title, and subjectheadings for bibliographic material (person, group or organization) in a librarycatalog [1, 2]. The basic problems addressed here are: 1) How how to encodenames referring to the same entity in a systematic way, so that the resourcescan later be found by searching (e.g., names can be transliterated differently indifferent languages)? 2) How to guarantee that different entities are not encodedby a similar name, which would lead to confusions in information retrieval (e.g.,different intances of “John Smith” encoded with the same string)?

Current methodology [1] relies on rigid syntax and rules for presenting theinformation. The basic sets of rules define the forms of the names (e.g. use theform Surname, First name for persons’ names), rules for changing the recordswhen the names change, and so on. The main goal is to enable efficient searchand retrieval based on author, title and subject names, that are often ambiguous,encoded in varying ways, and are subject to change.

Traditional authority control works well on small homogeneous environmentsand datasets, but as a downside, it requires much expertise from the indexersand manual work. These problems are emphasized when managing and interlink-ing large databases, such as authority records of different libraries in differentcountries based on different languages. Such tasks are becoming more and morepopular on the Web. At the same time, less and less experienced people areindexing content at, e.g., various Web 2.0 sites. To cope with these trends, newkind of approaches and tools for authority control are needed on the Web. As asolution approach, we propose Semantic Web technologies1.

A major idea of the Semantic Web is to identify entities, called resources,using Uniform Resource Identifiers2 (URI), whose uniqueness can be guaranteedusing the Domain Name Server system of the Web. The various linguistic repre-sentations of an identity are represented as literal properties attached to the URIusing the Resource Description Framework (RDF)3 for describing resources. Bytransforming the authority data about actors (persons, groups of people, organi-zations etc.) into RDF-based formats of the Semantic Web, this important databecomes part of the ever growing Web of Data4. Such content can be linkedwith related contents, transformed and enriched with simple tools and reason-ing rules, and be queried using standard protocols such as the SPARQL QueryLanguage for RDF5. On the semantic web, authority records are used in twoways: First, they are used in the traditional way for authority control, e.g., forfinding unique identifiers for identities during indexing. Second, the authorityRDF files can be exploited as a reusable content repository for applications, suchas semantic portals for cultural heritage [3].

In the following, we first analyze problems with the traditional authoritycontrol approach. As a solution approach an authority record ontology is pre-sented based on globally unique URIs and RDF. Two use cases of the RDF-basedauthority records are then demonstrated: First, we present ONKI People, aprototype of a national ontology service of authorities that is a part of theFinnish FinnONTO content infrastructure [4] and the National Ontology Ser-vice Library ONKI [5]. Second, we show how the ontological authority data hasbeen reused in the semantic cultural heritage portal CultureSampo6 [6] forpublishing collections of museums, libraries, and archives.

2 Authority Control and Its Problems

Authority control has traditionally two main objectives [1]:

1. Find a work (e.g. a book or an article) whose the author, title or subject isknown.

1 http://www.w3.org/2001/sw/2 http://www.w3.org/Addressing/3 http://www.w3.org/RDF/4 http://linkeddata.org/5 http://www.w3.org/TR/rdf-sparql-query/6 http://www.kulttuurisampo.fi/

2. List all works by a given author (or subject or another attribute like genre).

These objectives are called finding and collocating objectives, respectively.More recent work emphasizes the user’s goals. Functional Requirements for

Bibliographic Records7 (FRBR) identifies the following user tasks: 1) Find anentity (this is similar to find and collocate objectives). 2) Identify an entity(confirm that the entity found corresponds to the entity sought). 3) Select anentity (select from various manifestations of an entity, e.g. a CD, DVD or book).4) Obtain an entity. 5) Navigate (through related material).

A typical solution to meet the requirements is to build an authorized recordfor each document and actor. The record contains titles (and possibly theirsources) and cross references. An example of an authority record is shown inTable 1, taken from a requirements document by the Functional Requirementsand Numbering of Authority Records (FRANAR)8 working group.

Authorized headingMertz, Barbara

Information note/see also references:Barbara Mertz also writes under the pseudonyms Barbara

Michaels and Elizabeth Peters.For works written under those pseudonyms, search also under:>> Michaels, Barbara, 1927->> Peters, Elizabeth

See also reference tracings:<< Michaels, Barbara, 1927-<< Peters, Elizabeth

Table 1. Format for authority record by the FRANAR Working Group.

The record is identified by the authorized heading. The format of the headingis strictly defined as it glues the authorized record with the actor’s actual works,such as books or articles. The additional information on the record helps theuser to track related records and sources. It can also be used to disambiguateauthors with similar or identical names. Automatic tools for creating authorityrecords include clustering [7] and other name matching algorithms such as [8, 9],but even with these methods, human interaction is often required.

Authority records are often praised for their high quality. Maintaining costsare traditionally regarded as the most weighting drawback. Emerging problemsinclude reusing records between different libraries, museums, and achieves foraggregating contents. We believe that many of these problems originate fromthe archaic syntax used for representing the records (e.g. the MARC formats9),and lack of common, shared vocabulary and repositories for authority records.

7 http://www.frbr.org/8 http://archive.ifla.org/VII/d4/wg-franar.htm9 http://www.loc.gov/marc/

Still, one of the major problems is that an authority record is a list of literalvalues, where the record is identified by a selected special name. Selecting thename and its form in a systematic way in a global, distributed, multilingual,temporal environment is often a tricky problem. Furthermore, there is the prob-lem of matching the selected authority name(s) with the actual name(s) used inthe library databases, where different conventions may proliferate even withina single collection by different catalogers. In many cases, authority files are in-tended for human usage, and are difficult to interpret by machines—a centraltask on the Semantic Web.

In summary, problems of traditional authority records include the following:

1. Maintaining authority records is costly, since it requires lots of expertise andhandwork.

2. Aggregating content is difficult, since different naming conventions are in usein different authority files and library databases.

3. Records evolve in time, e.g., a person may change her name, which leads todifferent annotations in different times.

4. Records may use complicated syntax and metadata formats that make itdifficult to make contents mutually interoperable [10].

5. Records based on literal expressions do not link uniquely or straightforwardwith Web resources.

6. Efforts to build records are difficult to share at least on the level of the Web.

A recent approach to overcome some of the problems with authority files isthe Virtual Authority Files10 (VIAF). It attempts to aggregate the authorityfiles of the Library of Congress, the Deutsche Nationalbibliothek, and the Bib-liotheque Nationale de France under one service. The work is controlled by acentral authority and is based on MARC.

3 Actor Ontology

The model for our Actor Ontology is based on existing vocabularies for describingpersons, especilly FOAF11, Relationship12 and BIO13 vocabularies [11]. Thesevocabularies are not especially designed for authority files. However, they werechosen in order to make the RDF autority files semantically interoperable withdozens of projects and tools on the web. If the vocabularies lack some propertiesthat are essential, they can be added and format can be grown as needed. Inthis way extended FOAF-records are downward compatible with core FOAFrecords—extra information does not create problems. This approach, widely usedon the Semantic Web, is more flexible than traditional database applicationschemas, where the fields are decided and fixed once and for all, and after thateverybody have to live with them in good and in bad.10 http://www.oclc.org/research/projects/viaf/11 http://xmlns.com/foaf/spec/12 http://vocab.org/relationship/13 http://vocab.org/bio/0.1/

FOAF vocabulary defines 12 classes and about 50 properties. We use fourclasses from FOAF, Actor and its subclasses Person, Group and Organization.Properties we use include: firstName, surname and knows (knows being theessence of FOAF in building the who-knows-who index). The Relationship vo-cabulary defines sub-properties for foaf:knows, we use e.g. childOf, friendOf andsiblingOf. The BIO vocabulary defines biographical events, such as Birth, Deathand Marriage, which have place and time. All events extend the generic Eventclass thus making the model flexible and extendable. Finally we have definedown classes and properties to cover information presented in ULAN. The classeswe have defined include nationalities (European, Frech, Japanese,...) and roles(Artist, Painter, Collector,...). The properties include teacherOf, patronOf, mas-terOf etc. The key classes used in the actor ontology are summarized Table 2and properties in Table 3.

Vocabulary Class Description

FOAF Agent Super-class for all FOAF actorsFOAF Person One individualFOAF Group Group of persons or other groupsFOAF Organization Organization, i.e. companyBIO Event Super-class for biographical eventsBIO Birth Class presenting birth eventBIO Death Class presenting death event

ACTOR Artist Class presenting the role artistACTOR Painter Class presenting the role painter

(sub-class of Artist)ACTOR X Class presenting the role X

ACTOR European Class presenting nationality EuropeanACTOR Scandinavian Class presenting nationality Scandinavian

(sub-class of European)ACTOR Finnish Class presenting nationality Finnish

(sub-class of Scandinavian)ACTOR Y Class presenting the nationality Y

Table 2. Classes used in actor ontology.

The Semantic Web is built around the RDF data model. It is a simple wayto present data as triples of the form

(subject, predicate, object),e.g. (”Picasso”, ”roleIs”, ”Artist”). The resulting sematic net (graph), where

each triple represents an arc between two nodes in the graph, can be grown eas-ily and infinitely by adding new triples telling e.g. more about Picasso, aboutthe procession Artist, and about what is the meaning of the predicate propertyroleIs. (Properties, i.e., arcs, are at the same time nodes in the network and canbe attached with metadata, too.) All content based on RDF (or other semanticweb formats based on it) share the domain independent logical semantics un-derlying the system in use, which makes it possible for the machine to aggregate

Vocabulary Class Description

FOAF name Name of person, group or organizationFOAF firstName First nameFOAF surname SurnameFOAF knows Generic ”knows” relationshipRELATIONSHIP childOf Sub-property of foaf:knowsRELATIONSHIP friendOf Sub-property of foaf:knowsRELATIONSHIP spouseOf Sub-property of foaf:knowsRELATIONSHIP siblingOf Sub-property of foaf:knowsACTOR studentOf Sub-property of foaf:knowsACTOR patronOf Sub-property of foaf:knowsACTOR assistantOf Sub-property of foaf:knowsBIO place Place of bio:EventBIO date Date of bio:Event

Table 3. Key properties used in actor ontology.

and interpret the information in different applications. Another key to enablethe aggregation of the Web of Data, is to use shared, domain sepcific vocabu-laries and resources, such as authority indentifiers, in the content description.By using logic and by giving things global identifiers (URIs) taken from sharedvocabularies (ontologies), semantic web repositories can be at the same timeaggregated in an interoperable way over domain borders, and new informationcan be inferred. Figure 1 shows an example of the semantic network of resourcesaround a person. Here the person with ID toimo:p12 has name “Jussi Kurki”,was born in 1982 in the Place “Helsinki”, works with a person whose name is“John Smith”, and so on.

To test the idea of publishing and using authority files on the Semantic Webin practice we decided to use the Union List of Artist Names (ULAN)14 as astarting point. ULAN consists of over 120,000 individuals and corporate bodiesof art historical significance, with over 300,000 names. In addition, the datasetincludes comprehensive information about relationships between actors. ULANwas transformed into FOAF/RDF format using XSL-transformations. Figure 2shows as an example first lines of the ULAN record of “Gallen-Kallela, Akseli”,a Finnish artist, with his ID and alternative names.

3.1 URI Identifiers

URIs are used for identifying things on the Semantic Web. However, in FOAF,the global identifiers are not used. A reason for this is that there are no globalrepositories available for distibuting and managing identifiers for contemporaryordinary persons. Instead, person or group is identified by a set of unique prop-erties, such as the personal email address. In this way it is possible to identifytwo resources as the same even if they have different identifyers. The processof merging data from different sources is called ”smushing”. Depending on the14 http://www.getty.edu/research/conducting research/vocabularies/ulan/

Fig. 1. Semantic web connecting persons to places, concepts, other persons etc.

Fig. 2. Different names of Finnish artist Gallen-Kallela displayed on ULAN web site.

dataset different smushing strategies are needed. For example, historical per-sons do not have personal email addresses but the year of birth, year of death,

name, profession, nationality or some combination of these can be utilized indisambiguation.

In our Actor Ontology we are using identifiers (URIs) in the same way asin ULAN. It seems reasonable to try to identify persons by identifiers whenthis is possible. Using IDs is ealier and more error proof by using indirect indi-rect reasoning. However, using URIs and smushing hare not mutually exclusivestrategies. If some entity does not have a URI or information about the entity isdivided into two records, smushing is one way to try and aggregate the recordsand IDs in use.

We have listed below desirable requirements for the actor ontology URI iden-tifiers:

1. URIs are persistent and do not change e.g. when a persons name changes.2. URIs are not dependent on the language in use which makes them multilin-

gual.3. URIs are not dependent on the data described making them universally

applicable.4. URIs globally unique making diambibuation easy.5. URIs can be generated uniquely in a decentralized manner.

The last requirement conforms to the notion of Universally Unique Identi-fier15 (UUID) that are quaranteed to be unique. An example of a random uniqueURI is as follows:

http://seco.tkk.fi/onto/toimo/p05fe15c7da2fdecd387985f92af6c5484e930fd1

URIs are used in indexing time for storing references to actors in an unam-biguous and persistent way. During information retrieval (IR), IDs are neededfor disambiquating semantically end-user queries. When querying e.g. books au-thored by “John Smith”, the system can for example show the end-user a list ofknown potential John Smiths to select from, and then use the corresponding IDfor precise IR.

In order to facilitate such services, a centralized ontology ID service, used in adistributed manner by application systems is needed. In addition, a possibility forgenerating new unique IDs in a decenttralized way, and smushing them togetherlater on, is necessary.

In the following we present an centralized ontology service called ONKIPeople. It addresses the question of resolving the URIs for entities expressedby a name. ONKI People is a centralized repository of persons and organi-zations that offers services for searching as well as disambiguating people andorganizations.

4 ONKI People

The key features of ONKI People are a multifaceted search component and agraph visualizer component.15 http://www.ietf.org/rfc/rfc4122.txt

4.1 Faceted Search

Search starts when the user types one or more keywords to the search box andhits enter (Figure 3). The search covers all different forms of names as well asnationalities, roles and other properties related to actors. The user can refinethe search by selecting categories from the facets on the left hand side.

Search is multilingual and enhanced by the underlying ontology on nationali-ties and roles. User can type ’German’ or ’Deutsche’, assuming that the ontologyon the nationalities is multilingual, to receive the same results. Also typing ’Eu-ropean’ would yield to a search on German, French, Italian, etc. actors.

Because all forms of names are indexed, the user may type ’Ruys Pablo’ or’Picasso’ to find the famous artist. Also different scripts are supported since alldata is encoded using Unicode. Query results on the other hand are localizedand based on the language in use. In the result list, a localized summarisingheading is shown for each actor. The headings are of the same form as in ULAN.An example is given below in with an English and a Finnish heading:Gallen-Kallela, Akseli (Finnish painter and graphic artist, 1865-1931)

Gallen-Kallela, Akseli (suomalainen taiteilija, 1865-1931)

Fig. 3. ONKI People showing the search results for keyword ”napoleon”.

4.2 Social Graph Visualization

If the user clicks on an actor from the result list (see previous section), the socialnetwork of that actor, i.e., a circle of related actors is displayed. For example,

Figure 4 depicts the social network of Napolean I, emperor of France. The usercan further click on any neighbouring nodes in the graph to see the social networkof that actor. The graphs are rendered as SVG16 images. Nodes are positionedby a simple algorithm which places direct contacts around the actor, friends offriends on the second level, and so on.

Fig. 4. Displaying the social circle of Napoleon I in ONKI People.

The network can be used to disambiguate actors with similar names. Alsoroaming social connections is in itself interesting to the user.

4.3 ONKI Selector

ONKI Selector17 is a ready-to-use service for creating ”mash-up” applicationscost-efficiently. The selector on actor ontology is shown in Figure 5. The selectorcan be integrated into an existing web-based indexing system with one or twolines of Javascript code. ONKI Selector can be seen as a specialized input ele-ment. It returns either URIs or labels (or both) directly into a desired field. Itworks in a cross-domain manner, though this is invisible to the user. The end-user does not need to concern himself whether he is using a local input element16 http://www.w3.org/Graphics/SVG/About17 http://www.yso.fi/onkiselector/

or a cross-domain selector widget. Using the selector, one can add authoritycontrol into one’s own system with minimal effort.

Fig. 5. ONKI Selector on actor ontology.

4.4 Generic Machine Interfaces

Besides the human oriented web interface and ONKI Selector interface, ONKIPeople can be published through various machine interfaces. These includea cross-domain AJAX Javascript interface using DWR18 and a Web Serviceinterface using CXF19.

4.5 Key Points of the Implementation

ONKI People was implemented in Java on top of the Spring framework20.The application follows the Model-View-Controller (MVC) pattern where displaylogic is separated from the data model. JSP21 and XSLT22 were used as a viewlayer. The faceted search engine is backed by Lucene23 indexing. In the visualizercomponent, SVG graphs are rendered directly to a HTTP response to avoid theneed of caching and disk operations. Other optimizations include compressionof HTTP packets for faster page load times.

18 http://directwebremoting.org/19 http://cxf.apache.org/20 http://springframework.org/21 http://java.sun.com/products/jsp/22 http://www.w3.org/TR/xslt23 http://lucene.apache.org/

5 Applications

We have tested our data in the semantic cultural heritage portal Culture-Sampo24 [6]. Authority data has a two key functions in the portal. First, it actsas a supporting element for search and resource linking. Second, the data in itselfcan be mined using a method we call relational search. In the following, we fistintroduce the idea of relational search. In the subsequent section, we show howauthority data can be utilized by using simple rule-based recommendations.

5.1 Relational Search

Semantic association identification has been studied in national security applica-tions [12]. We have adapted this notion to the cultural heritage domain. The ideais to make it possible for the end-user to formulate queries such as ”How is Xrelated to Y” by selecting the end-point resources. The result is a set of semanticconnection paths between X and Y. The data used in CultureSampo is thesame that is used in ONKI People, i.e., the RDF-transformed ULAN dataset. Anexample of relational search is shown in Figure 6, showing the social connectionsbetween Akseli Gallen-Kallela, the Finnish artist, and Napoleon I.

Fig. 6. Finding social paths between actors.

The search is implemented as an uniformed breadth-first search. As a genericgraph search, any relations—not just social connections—could be searched. Thegraph is built directly from the RDF-graph. RDF-resources are nodes (i.e. actorentities that have URI) and properties (i.e. friendOf) are edges.

The implementation is in Java. The graph is stored as a memory based adja-cency list. To minimize memory consumption, graph nodes have only a minimal24 http://www.kulttuurisampo.fi

set of fields: an URI and a list of children. At this point, all relationships arebasically reduced to ”knows” and all data is reduced to URIs. Serialized on thedisk, the whole graph takes about 10MB of memory.

Even the longest paths (12 steps) can be found in less than 0.5 seconds. Thisis explained partly by the structure of the ULAN data. The graph has a stronglyconnected component of about 12,000 actors containing major artists, such asPicasso and Donatello. At the same time, thousands of others, especially con-temporary artists, don’t have many connections in the underlying RDF graph.

5.2 Portal Recommendations

CultureSampo builds recommendations using SPARQL-queries25. A page aboutthe artist Akseli Gallen-Kallela, based on ULAN, is shown in Figure 7. In ad-dition to the data about the artist, paintings and drawings created by Gallen-Kallela are automatically shown, based on various museum collections (i.e. worksof which creator is annotated by the artist URI). Recommendations shown inthe example are dynamically built by the following rule:

SELECT ?item WHERE {?item kulsa-schema:creator $this

}

Parameter $this is replaced by an actor URI. Return value ?item is a list ofrelated item URIs. Recommendations can be made more complex by addingfurther rules. For example, the following rule would return all works of an artist’spupils.

SELECT ?item WHERE {?student actor:studentOf $this?item kulsa-schema:creator ?student

}

Recommendations are dynamically calculated and easily configurable. Reboot-ing the server is the only requirement for new rules to become visible. Thoughdesigned for recommendations, the SPARQL endpoint of CultureSampo couldbe used for other purposes as well.

6 Discussion

This paper argued that Semantic Web technologies are useful in authority con-trol, especially when dealing with heterogeous, multilingual contents created bynon-professional indexers in a distributed manner—an evermore typical situationon the Web. In contrast to traditional authority control and database search,indexing and IR on the Semantic Wed is based on URIs, a semantic RDF net-work connecting them, shared ontologies, and logical reasoning based on globally25 http://www.w3.org/TR/rdf-sparql-query/

Fig. 7. Kulttuurisampo displays the record of Finnish artist Akseli Gallen-Kallela andon the same page recommends works by him.

shared semantic interpretations and standards. Benefits of the approach werediscussed and demonstrated in two practical use cases and implementations: anontology service and a semantic portal for cultural heritage.

ONKI People is a unique authority ontology service for actors based onthe Semantic Web approach. The solution builds, however, on various pieces ofrelated work, some of which are listed below. The faceted (view-based) searchparadigm used in ONKI People has been discussed e.g. in [13, 14]. The ideain ONKI People to support mash-up usage of ontologies as services in legacysystmes is based on [4, 15, 5]. In [16], another system is presented using an in-tegrable autocompletion widget. Relational search (association identification) isdiscussed e.g. in [12].

Acknowledgements This work is part of the National Semantic Web Ontologyproject in Finland26 (FinnONTO, 2003-2010), funded mainly by the NationalTechnology and Innovation Agency (Tekes) and a consortium of 38 organizations.The work is also partly funded by the FP7 EU Project SmartMuseum27.

References

1. Taylor, A.: Introduction to Cataloging and Classification. Library and InformationScience Text Series. Libraries unlimited (2006)

26 http://www.seco.tkk.fi/projects/finnonto/27 http://www.smartmuseum.eu

2. Tillett, B.: Authority control: State of the art and new perspectives. In: Interna-tional Conference on Authority Control, Haworth Press, Binghamton, NY (2004)

3. Hyvonen, E.: Semantic portals for cultural heritage. In Staab, S., Studer, R., eds.:Handbook on Ontologies (2nd Edition), Springer–Verlag (2009)

4. Hyvonen, E., Viljanen, K., Tuominen, J., Seppala, K.: Building a national semanticweb ontology and ontology service infrastructure—the FinnONTO approach. In:Proceedings of the ESWC 2008, Tenerife, Spain, Springer–Verlag (2008)

5. Viljanen, K., Tuominen, J., Hyvonen, E.: Ontology libraries for production use:The Finnish ontology library service ONKI. In: Proceedings of the ESWC 2009,Heraklion, Greece, Springer–Verlag (2009)

6. Hyvonen, E., Makela, E., Kauppinen, T., Alm, O., Kurki, J., Ruotsalo,T., Seppala, K., Takala, J., Puputti, K., Kuittinen, H., Viljanen, K.,Tuominen, J., Palonen, T., Frosterus, M., Sinkkila, R., Paakkarinen, P.,Laitio, J., Nyberg, K.: CultureSampo—Finnish culture on the Seman-tic Web 2.0. Thematic perspectives for the end-user. In: Museums andthe Web 2009 Proceedings, Archives & Museum Informatics, Toronto (2009)http://www.archimuse.com/mw2009/papers/hyvonen/hyvonen.html.

7. French, J., Powell, A., Schulman, E.: Using clustering strategies for creating au-thority files. Journal of the American Society for Information Science 51(8) (jun2000) 774–786

8. Galvez, C., Moya-Anegon, F.: Approximate personal name-matching throughfinite-state graphs. Journal of the American Society for Information Science andTechnology 58(13) (2007) 1960–1976

9. Borgman, C., Siegfriend, S.: Getty’s synoname and its cousins: A survey of appli-cations of personal name-matching algorithms. Journal of the American Societyfor Information Science and Technology 43(7) (1992) 459–476

10. Baca, M., ed.: Introduction to metadata. The Getty Research Institute, LosAngeles (2008)

11. Aleman-Meza, B., Bojars, U., Boley, H., Breslin, J., Mochol, M., Nixon, L., Polleres,A., Zhdanova, A.: Combining rdf vocabularies for expert finding. In Franconi, E.,Kifer, M., May, W., eds.: ESWC. Volume 4519 of Lecture Notes in ComputerScience., Springer (2007) 235–250

12. Sheth, A., Aleman-Meza, B., Arpinar, F.S., Sheth, A., Ramakrishnan, C., Bertram,C., Warke, Y., Anyanwu, K., Aleman-meza, B., Arpinar, I.B., , Kochut, K., Ha-laschek, C., Ramakrishnan, C., Warke, Y., Avant, D., Arpinar, F.S., Anyanwu,K., Kochut, K.: Semantic association identification and knowledge discovery fornational security applications. Journal of Database Management 16 (2005) 33–53

13. Pollitt, A.S.: The key role of classification and indexing in view-based searching. Technical report, University of Huddersfield, UK (1998)http://www.ifla.org/IV/ifla63/63polst.pdf.

14. Hearst, M., Elliott, A., English, J., Sinha, R., Swearingen, K., Lee, K.P.: Findingthe flow in web site search. CACM 45(9) (2002) 42–49

15. Tuominen, J., Frosterus, M., Viljanen, K., Hyvonen, E.: ONKI SKOS server forpublishing and utilizing SKOS vocabularies and ontologies as services. In: Pro-ceedings of the ESWC 2009, Heraklion, Greece, Springer–Verlag (2009)

16. Hildebrand, M., van Ossenbruggen, J., Amin, A., Aroyo, L., Wielemaker, J., Hard-man, L.: The design space of a configurable autocompletion component. TechnicalReport INS-E0708, Centrum voor Wiskunde en Informatica, Amsterdam (2007)