lod examplar - lod museum -

37
Hideaki Takeda, Fumi Kato / National Institute of Informatics LOD Application Exemplar - A case study: LODAC Museum Hideaki Takeda Fumi Kato National Institute of Informatics takeda nii.ac.jp 2012 INTERNATIONAL ASIAN SUMMER SCHOOL IN LINKED DATA IASLOD 2012, August 13-17, 2012, KAIST, Daejeon, Korea

Upload: hideaki-takeda

Post on 19-Aug-2015

1.132 views

Category:

Education


3 download

TRANSCRIPT

Hideaki Takeda, Fumi Kato / National Institute of Informatics

LOD Application Exemplar- A case study: LODAC Museum

Hideaki TakedaFumi Kato

National Institute of Informaticstakeda@ nii.ac.jp

2012 INTERNATIONAL ASIAN SUMMER SCHOOL IN LINKED DATA IASLOD 2012, August 13-17, 2012, KAIST, Daejeon, Korea

Hideaki Takeda, Fumi Kato / National Institute of Informatics

Aim of this talk

• How to plan, design, and implement LOD?• Learn from the case

Hideaki Takeda, Fumi Kato / National Institute of Informatics

LODAC Project• Open Social Semantic Web Platform for Academic

Resources– Providing platforms for Linked Open Data– Practicing data accumulation and publishing

• Interested Areas– Museum information– Geographical information, especially geographical names– Local information– Taxonomic information on species– …

http://lod.ac/

Hideaki Takeda, Fumi Kato / National Institute of Informatics

Linked Open Data Initiative

http://linkedopendata.jp/

• Non Profit Organization– (Under application for approval)

• Academia + IT People + local people• Aim: facilitate LOD activities among local

people

Hideaki Takeda, Fumi Kato / National Institute of Informatics

Museum data as LOD

• The state-of-the-art of museum information in Japan (nearly 6,000 museums in Japan)– Distributed

• Self maintained• Isolated

– Opaque• Self designed• Messy

• Aggregating and associating museum information– LODAC-Museum

Hideaki Takeda, Fumi Kato / National Institute of Informatics

LODAC Museum – Main work

• Gathering of data– Thesaurus, museum collections, etc

• Standardization of data– Representing data from different sources in a

unique form• Integration of data– Identifying data– Associating the same data

• Consuming of data

Hideaki Takeda, Fumi Kato / National Institute of Informatics

LODAC Museum Architecture

Gathering of data

Standardization of data

Integration of dataConsuming of data

Hideaki Takeda, Fumi Kato / National Institute of Informatics

Gathering data

• No museums publish data as LOD!• We use data published as Web pages– Scrape and translate data– License is not clear • It is a serous problem• We need permission from every site in principle• We got permission from some data publishers not all

Hideaki Takeda, Fumi Kato / National Institute of Informatics

Gathering data

• No museums publish data as LOD!• We use data published as Web pages– Scrape and translate data– License is not clear • It is a serous problem• We need permission from every site in principle• We got permission from some data publishers not all

Hideaki Takeda, Fumi Kato / National Institute of Informatics

DatasetType No. Data source

Art work (lodac:Work)

ca.80,000 Catalog of the collections of 3 National Art Museum (25,180), National Museum of Western Art (4,373), Tokushima Pref. Art Museum (18,482) … over 100 museums

Database for National Treasure & Important Cultural Property of National Designated (915)

The Japanese Art Thesaurus (266)Specimen (lodac:Speciment)

ca.1,690,000 (100+ Museum collections)Science Net (National Science Museum)

Person (foaf:Person) ca. 8,800 The Japanese Art Thesaurus

Facilities (icls. Museum)

ca. 200,000 The Japanese Art ThesaurusCultural Heritage OnlineGIS data National and Regional Planning Bureau

Hideaki Takeda, Fumi Kato / National Institute of Informatics

Extracting collection data from museum websites

Extract

Hideaki Takeda, Fumi Kato / National Institute of Informatics

Extract

Extracting collection data from museum websites

Property Value

Property Value

Hideaki Takeda, Fumi Kato / National Institute of Informatics

13

Standardization of dataRe-organized common metadata.

Raw Data

dc:title

crm:P45_consistOf

skos:preflabel

lodac:era

Re-organized Metadata

Current organized policies・ Use existing metadata・ Define own metadata.

....

Hideaki Takeda, Fumi Kato / National Institute of Informatics

14

Namespaces

Prefix Metadata Name

crm CIDOC-CRM

dc11 Dublin Core 1.1

dc DCMI Terms

skos Simple Knowledge Organization System

rdfs Resource Description Frame Work Schema

foaf Friend of a Friend

rda2 Resource Description and Access

lodac LODAC Project

Hideaki Takeda, Fumi Kato / National Institute of Informatics

Metadata schema for works

lodac:Work PropertyGenre lodac:genreType of cultural assets lodac:culturalAssetsCreator dc:creator / dc11:creatorNationality crm:P7_took_place_atTitle dc:title / skos:prefLabelTitle Pronunciation (yomi) dc:title @ja-hrkt / skos:altLabelTitle in English dc:title @en / skos:altLabelInscription crm:P62I_is_depicted_bySeal crm:P65_shows_visual_itemNo. of parts crm:P57_has_number_of_partsCollection dc:isPartOfCreated year dc:createdEstimated starting year lodac:estimatedStartYearMaterial dc:medium / crm:P45_consists_of

Hideaki Takeda, Fumi Kato / National Institute of Informatics

(Ref-resource)Creator’s reference

(ID-resource)Creator’s information

dc:references dc:references

(Ref-resource)Creator’s reference

Integrating Data

• How to integrate data from different sources – sharing of responsibility• Each source is responsible for its data

– Identifying IDs for data and managing data with the IDs

• LODAC is only responsible for integration– Assigning original IDs and associating other IDs to them

Hideaki Takeda, Fumi Kato / National Institute of Informatics

Integrating Data

Data from Source BIntegrated data

dc:references dc:references

dc:references dc:references

dc:references dc:references

dc:creatordc:creator

crm:P55_has_current_location

crm:P55_has_current_location

crm:P55_has_current_locationdc:creator

Data from Source AWork

Museum

Creator

Minimum Data to identify entitiesRaw Data for entities Raw Data for entities

Hideaki Takeda, Fumi Kato / National Institute of Informatics

Integration of Person Data• Matching of Creators– Base: List of Artists from Thesaurus of Japanese Art– Target: Creators of collection in museums + Dbpedia– Method: String match of names– Results: Links from artist nodes to work nodes are added

LODAC data

Link to Work

DBpedia

Basic Information for Creators

Links

Hideaki Takeda, Fumi Kato / National Institute of Informatics

19

Integrating DataIntegrate Item Source Amount

of DataIntegration

Data

FacilitiesA.Japanese Art Thesaurus 648

77B.Cultural Heritage Online 915

Title of important cultural properties

A.Japanese Art Thesaurus (Art work) 3,80074

B.DB for National Treasure (Art work) 10,115

Creator information and Work Title

A.Japanese Art Thesaurus (Creator) 1,33215,020

B.All of art work (Work title string) 61,861

Creator nameA.Japanese Art Thesaurus (Creator) 1,332

615B.All of art work title(using creator name) 61,861

Hideaki Takeda, Fumi Kato / National Institute of Informatics

<?xml version="1.0" encoding="UTF-8"?>

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:lodac="http://lod.a

c/ns/lodac#" xmlns:dc="http://purl.org/dc/terms/"

xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"

xmlns:skos="http://www.w3.org

/2004/02/skos/core#">

<foaf:Person rdf:about="http://lod.ac/id/359">

<lodac:creates rdf:resource="http://lod.ac/id/20029"/>

<lodac:creates rdf:resource="http://lod.ac/id/20128"/>

<lodac:creates rdf:resource="http://lod.ac/id/20755"/>

<lodac:creates rdf:resource="http://lod.ac/id/24768"/>

<lodac:creates rdf:resource="http://lod.ac/id/26732"/>

……

<dc:references rdf:resource="http://ja.dbpedia.org/resource/ 下村観山 "/>

<dc:references rdf:resource="http://lod.ac/ref/359"/>

<rdfs:label xml:lang="ja"> 下村観山 </rdfs:label>

<skos:prefLabel xml:lang="ja"> 下村観山 </skos:prefLabel>

<foaf:name xml:lang="ja"> 下村観山 </foaf:name>

</foaf:Person>

20

Publishing data as RDF

ID-resource URI(Own address)

http://lod.ac/id/359

Ref-resource URIhttp://lod.ac/ref/359

External linkDBpedia Japanese

Links to her/his work URI

Hideaki Takeda, Fumi Kato / National Institute of Informatics

LODAC Museum Architecture

Gathering of data

Standardization of data

Integration of dataConsuming of data

Hideaki Takeda, Fumi Kato / National Institute of Informatics

LODAC Applications

• Photo BURARI Pro• Yokohama Art Spot• Go2Museum• http://lod.ac/apps

Hideaki Takeda, Fumi Kato / National Institute of Informatics

23

Photo BURARI Pro

Photo App with SPARQL

(C)ATR-Promotions,Inc

Hideaki Takeda, Fumi Kato / National Institute of Informatics

• SPARQL Endpoints– DBpedia– Linked Geo Data– LODAC

• Other data source– Sinsai.info

• Using JSON Result– JSON Framework for

Objective C

Photo BURARI Pro(C)ATR-Promotions,Inc

Hideaki Takeda, Fumi Kato / National Institute of Informatics

An example in Objective C

NSString* sparql = @” PREFIX dct: <http://purl.org/dc/terms/ > PREFIX omgeo: <http://www.ontotext.com/owlim/geo#> PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> SELECT distinct ?link ?title ?lat ?long WHERE{ ?link dct:references ?ref. ?ref rdfs:label ?title. ?ref geo:lat ?lat. ?ref geo:long ?long. ?ref omgeo:within(NW_lat NW_long SE_lat SE_long). } LIMIT 30” ;NSString* query = (NSString*)CFURLCreateStringByAddingPercentEscapes(kDFAllocatorDefault, (CFStringRef)sparql, NULL, CFSTR(“;,/?:@=+$#”), kCFStringEncodingUTF8) ;

NSURL *url = [NSURL URLWithString: query ];NSMutableURLRequest *req = [NSMutableURLRequest requestWithURL:url]; [req setValue:@”application/sparql-results+json” forHTTPHeaderField:@”Accept”];

NSURLResponse *resp;NSError *err;NSData *data = [NSURLConnection sendSynchronousRequest:req returningResponse:&resp error:&err]; NSString* result = [[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding];

Hideaki Takeda, Fumi Kato / National Institute of Informatics

Yokohama Art Spot

–Application using museum and local data–Data related to art in

Yokohama• Collections• Events• Q&A

http://lod.ac/apps/yas/

LODAC Museum   ×   Yokohama Art LOD   ×   PinQA

Hideaki Takeda, Fumi Kato / National Institute of Informatics

System Architecture

Work

InstitutionArtistArtist Institution

EventQuestion

AnswerUser

PinQAYokohama Art LOD

LODAC Museum

SPARQL

JSON SPARQL

JSONXML

SPARQL

Yokohama Art Spot

‣ Python + SPARQLWrapper‣ Geolocation

Hideaki Takeda, Fumi Kato / National Institute of Informatics

PREFIX ical: <http://www.w3.org/2002/12/caaltzd#>PREFIX rdfs: <http://www.w3.org/2000/01/rdf-sl/icchema#>PREFIX event: <http://lod.ac/ns/event#>PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>PREFIX dc: <http://purl.org/dc/terms/>PREFIX foaf: <http://xmlns.com/foaf/0.1/>PREFIX lodacid: <http://lod.ac/id/>PREFIX omgeo: <http://www.ontotext.com/owlim/geo#>PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>PREFIX vcard: <http://www.w3.org/2006/vcard/ns#>

SELECT distinct ?event ?lat ?long ?title ?location_name ?location ?fee ?dtstart ?dtend WHERE { ?event a event:Event ; rdfs:label ?title ; event:fee ?fee; ical:location ?location ; ical:dtstart ?dtstart ; ical:dtend ?dtend . ?location rdfs:label ?location_name ; dc:references ?locRef. ?locRef omgeo:within(%(NE_lat)s %(NE_long)s %(SW_lat)s %(SW_long)s); vcard:postal-code ?postalcode; geo:lat ?lat; geo:long ?long. FILTER ((?dtstart > "%(dtstart)s"^^xsd:dateTime && ?dtstart < "%(dtend)s"^^xsd:dateTime) || (?dtend > "%(dtstart)s"^^xsd:dateTime && ?dtend < "%(dtend)s"^^xsd:dateTime) || (?dtstart < "%(dtstart)s"^^xsd:dateTime && ?dtend > "%(dtend)s"^^xsd:dateTime))}ORDER BY (omgeo:distance(?lat, ?long, %(C_lat)s, %(C_long)s))

Hideaki Takeda, Fumi Kato / National Institute of Informatics

PREFIX ical: <http://www.w3.org/2002/12/cal/icaltzd#>PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>PREFIX event: <http://lod.ac/ns/event#>PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>PREFIX dc: <http://purl.org/dc/terms/>PREFIX foaf: <http://xmlns.com/foaf/0.1/>PREFIX lodacid: <http://lod.ac/id/>PREFIX dc11: <http://purl.org/dc/elements/1.1/>

SELECT *WHERE { ?link a event:Event ; rdfs:label ?title ; event:fee ?fee; ical:categories ?cat; ical:location %(museum_id)s ; ical:dtstart ?dtstart ; ical:dtend ?dtend . ?cat dc11:title ?category. OPTIONAL{ ?link event:Credit ?crd . ?crd dc11:description ?credit . }}

Hideaki Takeda, Fumi Kato / National Institute of Informatics

PREFIX dc: <http://purl.org/dc/terms/>PREFIX dc11: <http://purl.org/dc/elements/1.1/>PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>PREFIX lodac: <http://lod.ac/ns/lodac#>PREFIX lodacid: <http://lod.ac/id/>

SELECT ?link ?title ?creator ?created ?genre ?material ?sizeWHERE { %(museum_id)s lodac:isProviderOf ?link . ?link rdfs:label ?title; dc:references ?workRef . ?workRef lodac:genre %(genre)s; dc11:creator ?creator; dc:medium ?material; dc:extent ?size . OPTIONAL{ ?workRef dc:created ?created; }}LIMIT 100

Hideaki Takeda, Fumi Kato / National Institute of Informatics

Go2Museum

http://160.193.95.58/~ueda/go2museum/

Hideaki Takeda, Fumi Kato / National Institute of Informatics

iPhone Android

Hideaki Takeda, Fumi Kato / National Institute of Informatics

Museum data from various web sites

LODACMuseum

LODACLocation

NDLSearch

CiNii

Yahoo!Location

GoogleWeb/Map/Route

Link

Link

Link

SearchSearch

Hideaki Takeda, Fumi Kato / National Institute of Informatics

Twitter: @go2museum

• “Today’s museum”• Recommendation based on lat&long of tweets

Hideaki Takeda, Fumi Kato / National Institute of Informatics

Summary

• A life cycle of data is described– Scraping, standardizing, integrating, and publishing

• Important issues– Recognizing data– Designing schema

• Good for data• Good for RDF Store and SPARQL

– Developing applications• More people can be involved• Next cycle of data

Hideaki Takeda, Fumi Kato / National Institute of Informatics

Hideaki Takeda, Fumi Kato / National Institute of Informatics

• Please submit papers• Meet at Nara