20130622 okfn hackathon t2
TRANSCRIPT
OKFN Korea
Hackathon Day
2013. 06. 22.
Toward Open Data World
OKFN Korea2
What is linked data, Open
data?
Refine
Modelling
Access
TripleStorage
other topics
image: Leo Oosterloo @ flickr.com
서울시 데이터 Enrichment 목표
서울시 데이터 상세화를 위한 온톨로지 설계 또는 매핑
구조화, 의미화, 그리고 연결: 서울시 데이터 (비정형 데이터)를 온톨로지를 이용해
모델링하고, 외부 데이터와 연결
영문화: 비 한국어권 사용자가 사용할 수 있는 서울시 데이터 제공
범위
서울시 데이터셋 약 40종
문화재: 문화재청에서 수집한 국내 문화재 (국보, 보물, 지정문화재, 무형문화재 등)
방법론: 기존 RDF 어휘의 재사용을 통해 데이터 모델링
1) 데이터 선정: 서울시 열린데이터 광장에서 모델링 대상 데이터셋 선정
2) 데이터 셋 항목 검토: 데이터 셋의 개별 항목과 Dbpedia 온톨로지 (클래스, 속성)
의 매핑 관계 검토• Dbpedia 온톨로지: 사물에 대한 개념 및 위키피디아 infobox 항목을 포함하고 있음
OKFN Korea3
서울시 데이터 Enrichment 예를 들어, '박물관'을 모델링 할 경우,
• 박물관에 대한 infobox 템플릿을 위키피디아에서 선택
• Dbpedia에서 박물관 infobox와 매핑한 어휘 선택
• 어휘와 데이터셋 항목 매핑
• 매핑되지 않는 항목의 모델링 여부 결정 (클래스, 속성 포함): 모델링 도구 결정 필요
• URI 체계 (별도 설계 필요) 적용
• 온톨로지 스키마 설계 완료
3) 데이터 정제
• Google Refine을 통해 데이터 정제
• Refine에서 추가하기 전에 할 작업
• 위치 데이터: 원본 데이터 (서울시)에 위치값을 변환 또는 추가
• 영문명: 한글명의 변환, 매핑 (수작업 필요)
• Refine에서 할 작업
– 한글, 영문 위키피디아 URL 추가
– Dbpedia, Freebase URL 추가: Refine reconciliation을 이용해서 추가
– RDF 변환 매핑 Skelton 작업
– RDF, Excel 추출
4) 데이터 업로드 (RDF 또는 Excel)
데이터 스토어 선택
Jena, 4Store, …OKFN Korea4
Contents
OKFN Korea
Modeling Issues1
Management Issues2
5
Modelling – RDF
Subject Predicate Object
Modelling – RDF
Subject Predicate Object
some school has a name/label some literal
Modelling – RDF
Subject Predicate Object
http://education.data.gov.uk
/id/school/401874
has a name/label ―Cardiff High School‖
Modelling – RDF
Subject Predicate Object
http://education.data.gov.uk
/id/school/401874
http://www.w3.org/2000/01/
rdf-schema#label
―Cardiff High School‖
Modelling – RDF
Subject Predicate Object
school:401874 rdfs:label ―Cardiff High School‖
where
school: = http://education.data.gov.uk/id/school/
rdfs: = http://www.w3.org/2000/01/rdf-schema#
Modelling – RDF
Subject Predicate Object
school:401874 rdfs:label ―Cardiff High School‖
school:401874 ont:districtAdministrative la:00PT
la:00PT rdfs:label Cardiff
Modelling – RDF
Subject Predicate Object
school:401874 rdfs:label ―Cardiff High School‖
school:401874 ont:districtAdministrative la:00PT
la:00PT rdfs:label ―Cardiff‖
school:401874
―Cardiff High School‖
ont:districtAdministrative
la:00PT
―Cardiff‖
rdfs:label
rdfs:label
Modelling – RDF
Subject Predicate Object
school:401874 rdfs:label ―Cardiff High School‖
school:401874 ont:districtAdministrative la:00PT
la:00PT rdfs:label ―Cardiff‖
la:00PT rdfs:label ―Caerdydd‖@cy
Modelling – vocabulariesLogical modelling
modelling the domain, not a particular data structure
what exists
what is asserted? what can you deduce from
that?
not about constraints as such
monotonic, open world
controlled
vocabulary
taxonomy
thesaurus
ontology
Ontology
Modelling – vocabularies
unfamiliar terminology but related to
information architecture and conceptual
modelling
domain-driven design
... and yes knowledge representation
Elements of:
Vocabulary (defining terms)
• I define a relationship called “prescribed dose.”
Schema (defining types)
• “prescribed dose” relates “treatments” to “dosagee
s”
Taxonomy (defining hierarchies)
• Any “doctor” is a “medical professional”
16
RDF Schema is…
Modelling – RDFSRDF vocabulary description language
classes, types and type hierarchy
ont:School rdfs:Classrdf:type
―School‖rdfs:label
Modelling – RDFSRDF vocabulary description language
classes, types and type hierarchy
ont:WelshEstablishment
ont:School rdfs:Classrdf:type
rdf:typerdfs:subClassOf
―School‖rdfs:label
Modelling – RDFSRDF vocabulary description language
classes, types and type hierarchy
school:401874
ont:WelshEstablishment
ont:WelshEstablishment
ont:School rdfs:Class rdf:typerdf:type
rdf:typerdfs:subClassOf
―School‖rdfs:label
Modelling – RDFSRDF vocabulary description language
classes, types and type hierarchy
school:401874
ont:WelshEstablishment
ont:WelshEstablishment
ont:School rdfs:Class rdf:typerdf:type
rdf:typerdfs:subClassOf
school:401874
ont:WelshEstablishment
ont:School
rdf:type
―School‖rdfs:label
―School‖
rdfs:label
Modelling – RDFSRDF vocabulary description language
properties, property hierarchy
school:401874
person:JoeBloggsont:staffAt
ont:headOf
rdf:Property
ont:headOf
rdf:type
rdfs:subPropertyOf
school:401874person:JoeBloggs
ont:staffAt
ont:headOf
Modelling – RDFSRDF vocabulary description language
class/property relations
domain
range
Already have power to do some vocabulary mapping
declare classes or properties from different vocabularies to be equivalent:
A rdfs:subClassOf B
B rdfs:subClassOf A
WOL OWL is…
23
Web Ontology Language
Elements of ontology
Same/different identity• “author” and “auteur” are the same relation
• two resources with the same “ISBN” are the same “book”
More expressive type definitions• A “cycle” is a “vehicle” with at least one “wheel”
• A “bicycle” is a “cycle” with exactly two “wheels”
More expressive relation definitions• “sibling” is a symmetric predicate
• the value of the “favorite dwarf” relation must be one of “happy”, “sleepy”, “sneezy”, “grumpy”, “dopey”, “bashful”, “doc”
OWL is…
24
Answer questions of
Consistency
• Are there any contradictions in this model?
Classification
• What are all the inferred types of this resource?
Satisfiability
• Are there any classes in this ontology that cannot p
ossibly have any members?
What can we do with OWL?
25
Building Useful Ontologies
Developing and maintaining quality ontolgies is very
challenging
Users need tools and services, e.g., to help check
if ontology is:
Meaningful — all named classes can have instances
http://www.aber.ac.uk/compsci/public/media/presentations/OUCL-seminar.ppt
Building Useful Ontologies
Developing and maintaining quality ontolgies is very
challenging
Users need tools and services, e.g., to help check
if ontology is:
Meaningful — all named classes can have instances
Correct — captures intuitions of domain experts
Building Useful Ontologies
Developing and maintaining quality ontolgies is very
challenging
Users need tools and services, e.g., to help check if ont
ology is:
Meaningful — all named classes can have instances
Correct — captures intuitions of domain experts
Minimally redundant — no unintended synonyms
Banana split Banana sundae
Modelling - OWL
richer modelling and semantics axioms on properties transitive, symmetric, inverseOf, ...
functional, inverse functional
equivalent property
axioms on classes intersection, union, disjoint, equivalent
restrictions on classes some value from, all values from, cardinality, has value,
one of, keys
axioms on individuals same as, different from, all different
imports
Modelling – OWL
supports much richer modelling
consistency checking of model
consistency checking of data
some surprises if used to schema languages
open world, no unique name assumption
can extend to closed world checking
inference
classification
inferred relationships
ModellingSpectrum of goals and styles
Lightweight vocabularies Rich ontological models
simple modelling
just enough agreement to get useful work done
removing boundaries to enable information to be found and connected
global consistency not possible
a little semantics goes a long way
rich domain models
need expressivity
consistency is critical
make complex inferences you can rely on, across data you trust
knowledge is power
ModellingOntology reuse
invest in complete ontology for a domain
rich but general model, may be modular inside
strong ―ontological commitment‖
e.g. medical ontologies
reuse small, common, vocabularies
FOAF, SIOC, Dublin Core, Org ...
pick and choose classes and properties you need
fill in a few missing links for your domain
generic reusable vocabularies
Data cube vocabulary
Reusable, public ontologies
33
Measurement Units Ontology
The Event Ontology
FOAF
schema.org is one of a number of microdata vocabularies
it is a shared collection of microdataschemas for use by webmasters
includes a type hierarchy, like an RDFS schema
starts with top-level Thing and DataType
types
properties are inherited by descendant types
Schema.org
34
annotate an item with text-valued properties using the “itemprop” attribute
microdata properties
35
<div itemscope>
<p>My name is <span itemprop="name">Daniel</span>.</p>
</div>
<div itemscope>
<p>Flavors in my favorite ice cream:</p>
<ul>
<li itemprop="flavor">Lemon sorbet</li>
<li itemprop="flavor">Apricot sorbet</li>
</ul>
</div>
Yahoo
Bing
Why should you use schema.org?
36
Top types
37
maintains schema.org ↔RDF
mappings
there are mappings for BIBO, DBpedia,
Dublin Core, FOAF, GoodRelations, SIOC,
and WordNet
also provides examples, tutorials, and data dumps
Schema.rdfs.org
38
Triple Store
OKFN Korea39
Triple Store & RDB
OKFN Korea
http://blog.gniewoslaw.pl/2012/11/relational-databases-vs-triple-stores/
40
Storage Solutionsfor RDF DataTriple Table (Basic Idea)
Store all RDF triples in a single table
Create indexes on combinations of S, P, and O
OKFN Korea41
The Internet Map
OKFN Korea
http://internet-map.net/
42
credits
These slides are partially based on “Linked data and its role in the semantic web” by Dave Reynolds, Epimorphics Ltd.
OKFN Korea43
OKFN Korea