Download - Linked data at globo.com
![Page 1: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/1.jpg)
Linked Data at
Semantic [email protected] Al-Chueyr and Rodrigo D. A. Senra{tatiana.martins, rodrigo.senra}@corp.globo.com
globo.com
![Page 2: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/2.jpg)
Andréia Bustamante
Ícaro Medeiros
Tatiana Al-Chueyr
Rodrigo Senra
Semantic Team
![Page 3: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/3.jpg)
Franklin Amorim
João Carlos Mendes Luís
Alberto Beloni
André Nicodemus
Contributors
![Page 4: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/4.jpg)
BROADCAST MOVIES PAY TV INTERNET
EVENTS MUSIC
PUBLISHING
NEW VENTURES NEWSPAPERRADIO NETWORK
![Page 5: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/5.jpg)
Motivation
Soccer player
Cross-link content from different web products
![Page 6: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/6.jpg)
Politician
MotivationCross-link content from different web products
![Page 7: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/7.jpg)
Celebrity
Motivation● Cross-link content from different web products
MotivationCross-link content from different web products
![Page 8: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/8.jpg)
Isabella Nardoni foi morta em 29 de março de 2008
na Zona Norte de São Paulo (Foto:Reprodução)
Isabella de Oliveira Nardoni, de 5 anos, foi morta na noite de 29 de março de 2008. A perícia concluiu que a menina foi atirada do sexto andar do prédio onde moravam seu pai, Alexandre Nardoni, sua madrasta, Anna Carolina Jatobá, e dois filhos pequenos do casal, na Vila Isolina Mazzei, na zona norte de São Paulo.
Túmulo de Isabella vira local de visitação em SP; casal Nardoni está preso.
Caso Isabella Nardoni
Juliana Cardilli G1 SP
RDF
FOAF
GEO
Dublin Core
SKOS
Semantic markup in web pagesMotivation
![Page 9: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/9.jpg)
Recommend annotations to information ProducerMotivation
![Page 10: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/10.jpg)
Suggest related content to information Consumer Motivation
![Page 11: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/11.jpg)
Suggest related content to information Consumer Motivation
![Page 12: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/12.jpg)
Suggest related content to information Consumer Motivation
![Page 13: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/13.jpg)
Outcomes ● Flexible ways to organize content
● Ease to find related issues
● Explicit relations derived from annotated content
● Up-to-date topic pages with little editorial effort
● Linking content across different web products
● Seamless navigation leading to flow state
![Page 14: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/14.jpg)
Status QuoUsed by the main web products of Globo.com
linking, among others:
○ 18,485 organizations
○ 82,386 people
○ 9,129 places
○ 1,000,000+ annotated news
from August 2010 to May 2013
![Page 15: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/15.jpg)
Legacy Architecture
CDA
CMA
triple store
search engine
ontology
![Page 16: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/16.jpg)
CDA
CMA
CDACMA
CDACMA
CDACMA
Legacy Architecture
triple store
search engine
ontology
![Page 17: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/17.jpg)
Poor data management
○ direct access to triple store (unmanaged)
○ difficulty to share data (distributed DBs)
○ re-sync triple-store and search engine index
○ scalability of triple store
○ high entropy in distributed ontology engineering
Problems
![Page 18: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/18.jpg)
Problems
![Page 19: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/19.jpg)
Ontology Engineering
Domain-driven(current)
Base
G1 GE EGO TVG
news sports gossip tv
Upper
Person Organization
Music
Politics
Programme Education
Sports
Product-driven(past)
Place
![Page 20: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/20.jpg)
Possible Solution
UpperOntology
![Page 21: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/21.jpg)
Semantic as a library
○ many different versions in production
○ programming language dependent
○ steep learning curve for RDF/OWL/SPARQL
Problems
![Page 22: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/22.jpg)
Create an open semantic data management platform
● Scalable
● Mobile and Web friendly
● Interconnect Globo's data with external data sources
● Automate content extraction (including NER)
Next Step
![Page 23: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/23.jpg)
Brainiaklinked data restful API
![Page 24: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/24.jpg)
CDA
CMA
CDACMA
CDACMA
CDACMA
Legacy Architecture
triple store
search engine
ontology
![Page 25: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/25.jpg)
APIBrainiak
CMA
CDA
CDA
CDA
CDA
triple store
search engine
Under Development
![Page 26: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/26.jpg)
Requirements● Indirect usage of SPARQL
● Programming language independent
● Data management with quality
● Finer-grained authorization and authentication
● Isolate applications from triplestore
● Improve triplestore performance
![Page 27: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/27.jpg)
SPARQL query DEFINE input:inference <http://data.globo.com/ruleset> SELECT ?uri ?label FROM <http://data.globo.com/sports/> WHERE { ?uri a <http://data.globo.com/sports/Team>; rdfs:label ?label . } LIMIT 10 OFFSET 0
task: list all sports teams
![Page 28: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/28.jpg)
/sports/Team
Brainiak query
GET
![Page 29: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/29.jpg)
SPARQL response
![Page 30: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/30.jpg)
Brainiak response
![Page 31: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/31.jpg)
Brainiak concepts
● Instance
● Collection (set of instances from a given Class)
● Schema (the Class definition)
● Context
![Page 32: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/32.jpg)
Instance
![Page 33: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/33.jpg)
Collection
![Page 34: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/34.jpg)
Schema
![Page 35: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/35.jpg)
Context
![Page 36: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/36.jpg)
placeState
Brazil
Country
JapanCity
Real example
![Page 37: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/37.jpg)
/placeGET
/place/CountryGET
/place/Country/_schemaGET
/place/Country/BrazilGET
Real example
![Page 38: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/38.jpg)
resource URL→ /place/Country/Brazil
context (graph)→ http://semantica.globo.com/place/ class → http://semantica.globo.com/place/Countryinstance → http://semantica.globo.com/place/Country/Brazil
URI Conventions
![Page 39: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/39.jpg)
/place/River ?graph_uri=http://dbpedia.org/resource/classes#&class_uri=dbpedia:River
Overridencontext (graph) → http://dbpedia.org/resource/classes#class → http://dbpedia.org/ontology/River
Conventioncontext (graph)→ http://semantica.globo.com/place/ class → http://semantica.globo.com/place/River
Legacy URIs
![Page 40: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/40.jpg)
Hypermedia
● Flexibility and programmatic adaptation
● Semantic affordances
● Client has to understand what is consumed
● "Hypermedia APIs are not fully baked yet"
![Page 41: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/41.jpg)
Brainiak hypermedia graph
context instance
/ schema
inCollection
item
instances
instances
describedBy
self
replacedelete
self
instances
self
self
self
create
collection
![Page 42: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/42.jpg)
Services
● List Contexts
● List Collections
● Get a Schema
● List Prefixes
● Status of Services
● Create
● Retrieve
● Delete
● Edit
● List
Instances
![Page 43: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/43.jpg)
Features
● JSON-Schema
● JSON-LD
● REST
● Python + Tornado
OPTIONS GET PUT POST DELETE
![Page 44: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/44.jpg)
/sports/Team
Brainiak query
GET
![Page 45: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/45.jpg)
Brainiak response
![Page 46: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/46.jpg)
Brainiak response
![Page 47: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/47.jpg)
Brainiak response
![Page 48: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/48.jpg)
Brainiak response
![Page 49: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/49.jpg)
SPARQL query
SELECT DISTINCT ?classWHERE { <http://data.globo.com/place/City> rdfs:subClassOf ?class OPTION (TRANSITIVE, t_distinct, t_step('step_no') as ?n, t_min (0)) . ?class a owl:Class .}
task: retrieve all superclasses of a class
![Page 50: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/50.jpg)
SPARQL query SELECT DISTINCT ?predicate ?predicate_graph ?predicate_comment ?type ?range ?title ?range_graph ?range_label ?super_propertyWHERE { { GRAPH ?predicate_graph { ?predicate rdfs:domain ?domain_class } . } UNION { graph ?predicate_graph {?predicate rdfs:domain ?blank} . ?blank a owl:Class . ?blank owl:unionOf ?enumeration . OPTIONAL { ?enumeration rdf:rest ?list_node OPTION(TRANSITIVE, t_min (0)) } . OPTIONAL { ?list_node rdf:first ?domain_class } . } FILTER (?domain_class IN (<http://data.globo.com/place/City>, <http://data.globo.com/place/GeopoliticalDivision>, <http://data.globo.com/place/Place>, <http://data.globo.com/upper/Object>, <http://data.globo.com/upper/Substance>, <http://data.globo.com/upper/ConcreteEntity>, <http://data.globo.com/upper/Entity>)) {?predicate rdfs:range ?range .} UNION { ?predicate rdfs:range ?blank . ?blank a owl:Class . ?blank owl:unionOf ?enumeration . OPTIONAL { ?enumeration rdf:rest ?list_node OPTION(TRANSITIVE, t_min (0)) } . OPTIONAL { ?list_node rdf:first ?range } . } FILTER (!isBlank(?range)) ?predicate rdfs:label ?title . ?predicate rdf:type ?type . OPTIONAL { ?predicate rdfs:subPropertyOf ?super_property } . FILTER (?type in (owl:ObjectProperty, owl:DatatypeProperty)) . FILTER(langMatches(lang(?title), "en") OR langMatches(lang(?title), "")) . OPTIONAL { ?predicate rdfs:comment ?predicate_comment } FILTER(langMatches(lang(?predicate_comment), "en") OR langMatches(lang(?predicate_comment), "")) . OPTIONAL { GRAPH ?range_graph { ?range rdfs:label ?range_label . FILTER(langMatches(lang(?range_label), "en") OR langMatches(lang(?range_label), "")) . } }}
task: retrieve all properties of a group of classes
![Page 51: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/51.jpg)
SPARQL query SELECT DISTINCT ?predicate ?min ?max ?range ?enumerated_value ?enumerated_value_labelWHERE { <http://data.globo.com/place/City> rdfs:subClassOf ?s OPTION (TRANSITIVE, t_distinct, t_step('step_no') as ?n, t_min (0)) . ?s owl:onProperty ?predicate . OPTIONAL { ?s owl:minQualifiedCardinality ?min } . OPTIONAL { ?s owl:maxQualifiedCardinality ?max } . OPTIONAL { { ?s owl:onClass ?range } UNION { ?s owl:onDataRange ?range } UNION { ?s owl:allValuesFrom ?range } OPTIONAL { ?range owl:oneOf ?enumeration } . OPTIONAL { ?enumeration rdf:rest ?list_node OPTION(TRANSITIVE, t_min (0)) } . OPTIONAL { ?list_node rdf:first ?enumerated_value } . OPTIONAL { ?enumerated_value rdfs:label ?enumerated_value_label . } . }}
}
task: retrieve the cardinalities of all properties of a certain class
![Page 52: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/52.jpg)
/place/City/_schema
Brainiak query
GET
![Page 53: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/53.jpg)
● SEO (automatic schema.org)
● Improved annotator (DBpedia Spotlight)
● Richer content relationships (inference)
● Link to open data (e.g. DBPedia, dados.gov.br)
Next steps
![Page 54: Linked data at globo.com](https://reader034.vdocuments.site/reader034/viewer/2022051323/547baf49b4795972098b4eed/html5/thumbnails/54.jpg)
Stay tuned
@brainiak_api
... will be soon released as an open source project !