masws - natural language and the semantic web · populating the semantic web why and how? why...
TRANSCRIPT
![Page 1: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/1.jpg)
MASWSNatural Language and the Semantic Web
Kate Byrne
School of Informatics
14 February 2011
1 / 27
![Page 2: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/2.jpg)
Populating the Semantic Webwhy and how?text to RDF
Organising the Triplesschema designopen issues and problems
References
2 / 27
![Page 3: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/3.jpg)
Populating the Semantic Web why and how?
The Semantic Web needs data
• Semantic Web first proposed in 1999
• Why hasn’t it taken off the way WWW did?
3 / 27
![Page 4: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/4.jpg)
Populating the Semantic Web why and how?
The Semantic Web needs data
• Semantic Web first proposed in 1999
• Why hasn’t it taken off the way WWW did?
3 / 27
![Page 5: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/5.jpg)
Populating the Semantic Web why and how?
The Semantic Web needs data
• Semantic Web first proposed in 1999
• Why hasn’t it taken off the way WWW did?
not
enough
data
queries
aren’t
usefulmy data
adding
I’ll delay
3 / 27
![Page 6: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/6.jpg)
Populating the Semantic Web why and how?
How to get data
create convert
How do we populate the Semantic Web?
structured data
and semi−structured
unstructured
4 / 27
![Page 7: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/7.jpg)
Populating the Semantic Web why and how?
How to get data
create convert
structured data
How do we populate the Semantic Web?
unstructured
and semi−structured
4 / 27
![Page 8: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/8.jpg)
Populating the Semantic Web why and how?
Semi-structured content is everywhere
5 / 27
![Page 9: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/9.jpg)
Populating the Semantic Web why and how?
Semi-structured content is everywhere
5 / 27
![Page 10: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/10.jpg)
Populating the Semantic Web why and how?
Semi-structured content is everywhere
5 / 27
![Page 11: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/11.jpg)
Populating the Semantic Web why and how?
Why convert unstructured text?
• Extracting RDF triples from free text is not easy• So why is it worth doing?
• there was knowledge before WWW (!)• high quality material, often historical• big digitisation push in 1990s (and again in 2011?) –• – but still challenging to search text effectively• natural language is our preferred way of communicating
• Can the web learn to talk in natural language...?
6 / 27
![Page 12: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/12.jpg)
Populating the Semantic Web why and how?
Why convert unstructured text?
• Extracting RDF triples from free text is not easy• So why is it worth doing?
• there was knowledge before WWW (!)• high quality material, often historical• big digitisation push in 1990s (and again in 2011?) –• – but still challenging to search text effectively• natural language is our preferred way of communicating
• Can the web learn to talk in natural language...?
6 / 27
![Page 13: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/13.jpg)
Populating the Semantic Web why and how?
Why convert unstructured text?
• Extracting RDF triples from free text is not easy• So why is it worth doing?
• there was knowledge before WWW (!)• high quality material, often historical• big digitisation push in 1990s (and again in 2011?) –• – but still challenging to search text effectively• natural language is our preferred way of communicating
• Can the web learn to talk in natural language...?
6 / 27
![Page 14: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/14.jpg)
Populating the Semantic Web why and how?
Why convert unstructured text?
• Extracting RDF triples from free text is not easy• So why is it worth doing?
• there was knowledge before WWW (!)• high quality material, often historical• big digitisation push in 1990s (and again in 2011?) –• – but still challenging to search text effectively• natural language is our preferred way of communicating
• Can the web learn to talk in natural language...?
6 / 27
![Page 15: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/15.jpg)
Populating the Semantic Web why and how?
Why convert unstructured text?
• Extracting RDF triples from free text is not easy• So why is it worth doing?
• there was knowledge before WWW (!)• high quality material, often historical• big digitisation push in 1990s (and again in 2011?) –• – but still challenging to search text effectively• natural language is our preferred way of communicating
• Can the web learn to talk in natural language...?
6 / 27
![Page 16: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/16.jpg)
Populating the Semantic Web why and how?
Why convert unstructured text?
• Extracting RDF triples from free text is not easy• So why is it worth doing?
• there was knowledge before WWW (!)• high quality material, often historical• big digitisation push in 1990s (and again in 2011?) –• – but still challenging to search text effectively• natural language is our preferred way of communicating
• Can the web learn to talk in natural language...?
6 / 27
![Page 17: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/17.jpg)
Populating the Semantic Web why and how?
Why convert unstructured text?
• Extracting RDF triples from free text is not easy• So why is it worth doing?
• there was knowledge before WWW (!)• high quality material, often historical• big digitisation push in 1990s (and again in 2011?) –• – but still challenging to search text effectively• natural language is our preferred way of communicating
• Can the web learn to talk in natural language...?
6 / 27
![Page 18: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/18.jpg)
Populating the Semantic Web why and how?
Why convert unstructured text?
• Extracting RDF triples from free text is not easy• So why is it worth doing?
• there was knowledge before WWW (!)• high quality material, often historical• big digitisation push in 1990s (and again in 2011?) –• – but still challenging to search text effectively• natural language is our preferred way of communicating
• Can the web learn to talk in natural language...?
6 / 27
![Page 19: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/19.jpg)
Populating the Semantic Web why and how?
7 / 27
![Page 20: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/20.jpg)
Populating the Semantic Web txt2rdf
Natural Language Processing
• Extraction of semantic content from natural language text
• Systems now available – Open Calais, Powerset, AlchemyAPI
• To explain the process: part of my PhD work
“Kate likes chocolate.”Think of RDF triples as declarative sentences:RDF nodes = nouns; RDF arcs = verbs
8 / 27
![Page 21: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/21.jpg)
Populating the Semantic Web txt2rdf
Natural Language Processing
• Extraction of semantic content from natural language text
• Systems now available – Open Calais, Powerset, AlchemyAPI
• To explain the process: part of my PhD work
“Kate likes chocolate.”Think of RDF triples as declarative sentences:RDF nodes = nouns; RDF arcs = verbs
8 / 27
![Page 22: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/22.jpg)
Populating the Semantic Web txt2rdf
Natural Language Processing
• Extraction of semantic content from natural language text
• Systems now available – Open Calais, Powerset, AlchemyAPI
• To explain the process: part of my PhD work
“Kate likes chocolate.”Think of RDF triples as declarative sentences:RDF nodes = nouns; RDF arcs = verbs
8 / 27
![Page 23: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/23.jpg)
Populating the Semantic Web txt2rdf
Natural Language Processing
• Extraction of semantic content from natural language text
• Systems now available – Open Calais, Powerset, AlchemyAPI
• To explain the process: part of my PhD work
“Kate likes chocolate.”Think of RDF triples as declarative sentences:RDF nodes = nouns; RDF arcs = verbs
8 / 27
![Page 24: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/24.jpg)
Populating the Semantic Web txt2rdf
Natural Language Processing
• Extraction of semantic content from natural language text
• Systems now available – Open Calais, Powerset, AlchemyAPI
• To explain the process: part of my PhD work
“Kate likes chocolate.”Think of RDF triples as declarative sentences:RDF nodes = nouns; RDF arcs = verbs
:kate:likes
:chocolate
prefix : <http://www.ltg.ed.ac.uk/>
8 / 27
![Page 25: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/25.jpg)
Populating the Semantic Web txt2rdf
Converting text to RDF
fancy NLP processing
and RDFisation
9 / 27
![Page 26: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/26.jpg)
Populating the Semantic Web txt2rdf
Example – RCAHMS text
10 / 27
Evidence of a quartz knapping site was found within the confines of the stone
strongly suggests a domestic site.Besides the quartz implements and corresponding waste, several other artifacts of localorigin occurred including a split pebble axe of greenstone with Shetland EarlyBronze Age affinities. B Beveridge, 1972.Field survey and excavation, as a response to continual wind and marineerosion, was carried out at the Sands of Breckon between1982 and 1983.HP50NW 11.00 was recorded as a stone settings surrounded byoccupational debris (Site 22). Excavation revealed midden deposits of anearly Iron Age date and a surface scatter of artefacts of mixed dates. Thestone settings were tentatively interpreted as the basal stones of longcists.Historic Scotland Archive Project (SW) 2002.
circle, and in conjunction with several structures within the inner ring,
site 20
![Page 27: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/27.jpg)
Populating the Semantic Web txt2rdf
Example – RCAHMS text
10 / 27
site 20
![Page 28: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/28.jpg)
Populating the Semantic Web txt2rdf
Example – RCAHMS text
10 / 27
site 20
![Page 29: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/29.jpg)
Populating the Semantic Web txt2rdf
The txt2rdf process in a nutshell
• First find the important nouns, or “named entities”:
Example
Mr. Peter Moar found a small hoard of 5 polished stone knives on thesame patch in May 1946.
• Then look for relationships between them:
11 / 27
![Page 30: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/30.jpg)
Populating the Semantic Web txt2rdf
The txt2rdf process in a nutshell
• First find the important nouns, or “named entities”:
Example
Mr. Peter Moar found a small hoard of 5 polished stone knives on thesame patch in May 1946.
• Then look for relationships between them:
11 / 27
![Page 31: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/31.jpg)
Populating the Semantic Web txt2rdf
The txt2rdf process in a nutshell
• First find the important nouns, or “named entities”:
Example
Mr. Peter Moar found a small hoard of 5 polished stone knives on thesame patch in May 1946.
• Then look for relationships between them:
11 / 27
![Page 32: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/32.jpg)
Populating the Semantic Web txt2rdf
The txt2rdf process in a nutshell
• First find the important nouns, or “named entities”:
Example
Mr. Peter Moar found a small hoard of 5 polished stone knives on thesame patch in May 1946.
• Then look for relationships between them:
Mr. Peter Moar found a small hoard of 5 polished stone knives on the
same patch in May 1946.
11 / 27
![Page 33: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/33.jpg)
Populating the Semantic Web txt2rdf
My txt2rdf pipeline
sentenceand para
split
POS tagtokenise
multi−wordtokens and
features trained NERmodel
list of NEsand
classes
removeunwantedrelations
generatetriples
attachsiteids
trained REmodel
set of NEpairs andfeatures
list ofrelations
and classes
sfsjksjwjvssjkljljs sd’lajoen s
jjs kjdlk lksjlkj sks oihhg sk
jjlkjlj jljbjl skj ekw
RDFtranslation
Graphof triples
Pre−processing Named Entity Recognition
Relation Extraction
Text documents
12 / 27
![Page 34: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/34.jpg)
Populating the Semantic Web txt2rdf
Dealing with EVENTs
Mr. Peter Moar found a small hoard of 5 polished stone knives on the
same patch in May 1946.
• FIND event is n-ary relation:
eventRel(eventAgent, eventAgentRole, eventDate, eventPatient, eventPlace)
find123(“Mr. Peter Moar”,,“May 1946”,“polished stone knives”,“Hill of Shurton”)
Split event relation into RDF triples
:find123 :hasAgent :peterMoar .:find123 :hasDate :may1946 .:find123 :hasPatient :polishedStoneKnives .:find123 :hasLocation :hillOfShurton .
13 / 27
![Page 35: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/35.jpg)
Populating the Semantic Web txt2rdf
Dealing with EVENTs
Mr. Peter Moar found a small hoard of 5 polished stone knives on the
same patch in May 1946.
• FIND event is n-ary relation:
eventRel(eventAgent, eventAgentRole, eventDate, eventPatient, eventPlace)
find123(“Mr. Peter Moar”,,“May 1946”,“polished stone knives”,“Hill of Shurton”)
Split event relation into RDF triples
:find123 :hasAgent :peterMoar .:find123 :hasDate :may1946 .:find123 :hasPatient :polishedStoneKnives .:find123 :hasLocation :hillOfShurton .
13 / 27
![Page 36: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/36.jpg)
Populating the Semantic Web txt2rdf
Archaeological events
• Example was from RCAHMS data:http://canmore.rcahms.gov.uk/en/site/1102/details/hill+of+shurton/
• Extracting events allows structured searches
• Question: What do we do with NEs that are not in relations?
For more informationByrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text.
14 / 27
![Page 37: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/37.jpg)
Populating the Semantic Web txt2rdf
Archaeological events
• Example was from RCAHMS data:http://canmore.rcahms.gov.uk/en/site/1102/details/hill+of+shurton/
• Extracting events allows structured searches
• Question: What do we do with NEs that are not in relations?
For more informationByrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text.
14 / 27
![Page 38: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/38.jpg)
Populating the Semantic Web txt2rdf
Archaeological events
• Example was from RCAHMS data:http://canmore.rcahms.gov.uk/en/site/1102/details/hill+of+shurton/
• Extracting events allows structured searches
• Question: What do we do with NEs that are not in relations?
For more informationByrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text.
14 / 27
![Page 39: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/39.jpg)
Populating the Semantic Web txt2rdf
Archaeological events
• Example was from RCAHMS data:http://canmore.rcahms.gov.uk/en/site/1102/details/hill+of+shurton/
• Extracting events allows structured searches
• Question: What do we do with NEs that are not in relations?
For more informationByrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text.
14 / 27
![Page 40: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/40.jpg)
Populating the Semantic Web txt2rdf
Available NLP systems
• OpenCalais: http://viewer.opencalais.com/
• Powerset: now part of Microsoft’s Bing
• AlchemyAPI: http://www.alchemyapi.com/api/demo.html
• Trained on lots of (English) text, mostly news articles• Many general-purpose NLP tools available online, eg
• nltk (http://www.nltk.org/)• Edinburgh LTG software (http://www.ltg.ed.ac.uk/software)• NaCTeM (http://nactem.ac.uk/)
15 / 27
![Page 41: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/41.jpg)
Populating the Semantic Web txt2rdf
Available NLP systems
• OpenCalais: http://viewer.opencalais.com/
• Powerset: now part of Microsoft’s Bing
• AlchemyAPI: http://www.alchemyapi.com/api/demo.html
• Trained on lots of (English) text, mostly news articles• Many general-purpose NLP tools available online, eg
• nltk (http://www.nltk.org/)• Edinburgh LTG software (http://www.ltg.ed.ac.uk/software)• NaCTeM (http://nactem.ac.uk/)
15 / 27
![Page 42: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/42.jpg)
Populating the Semantic Web txt2rdf
Available NLP systems
• OpenCalais: http://viewer.opencalais.com/
• Powerset: now part of Microsoft’s Bing
• AlchemyAPI: http://www.alchemyapi.com/api/demo.html
• Trained on lots of (English) text, mostly news articles• Many general-purpose NLP tools available online, eg
• nltk (http://www.nltk.org/)• Edinburgh LTG software (http://www.ltg.ed.ac.uk/software)• NaCTeM (http://nactem.ac.uk/)
15 / 27
![Page 43: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/43.jpg)
Populating the Semantic Web txt2rdf
Available NLP systems
• OpenCalais: http://viewer.opencalais.com/
• Powerset: now part of Microsoft’s Bing
• AlchemyAPI: http://www.alchemyapi.com/api/demo.html
• Trained on lots of (English) text, mostly news articles• Many general-purpose NLP tools available online, eg
• nltk (http://www.nltk.org/)• Edinburgh LTG software (http://www.ltg.ed.ac.uk/software)• NaCTeM (http://nactem.ac.uk/)
15 / 27
![Page 44: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/44.jpg)
Populating the Semantic Web txt2rdf
Available NLP systems
• OpenCalais: http://viewer.opencalais.com/
• Powerset: now part of Microsoft’s Bing
• AlchemyAPI: http://www.alchemyapi.com/api/demo.html
• Trained on lots of (English) text, mostly news articles• Many general-purpose NLP tools available online, eg
• nltk (http://www.nltk.org/)• Edinburgh LTG software (http://www.ltg.ed.ac.uk/software)• NaCTeM (http://nactem.ac.uk/)
15 / 27
![Page 45: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/45.jpg)
Populating the Semantic Web txt2rdf
Available NLP systems
• OpenCalais: http://viewer.opencalais.com/
• Powerset: now part of Microsoft’s Bing
• AlchemyAPI: http://www.alchemyapi.com/api/demo.html
• Trained on lots of (English) text, mostly news articles• Many general-purpose NLP tools available online, eg
• nltk (http://www.nltk.org/)• Edinburgh LTG software (http://www.ltg.ed.ac.uk/software)• NaCTeM (http://nactem.ac.uk/)
15 / 27
![Page 46: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/46.jpg)
Populating the Semantic Web txt2rdf
Available NLP systems
• OpenCalais: http://viewer.opencalais.com/
• Powerset: now part of Microsoft’s Bing
• AlchemyAPI: http://www.alchemyapi.com/api/demo.html
• Trained on lots of (English) text, mostly news articles• Many general-purpose NLP tools available online, eg
• nltk (http://www.nltk.org/)• Edinburgh LTG software (http://www.ltg.ed.ac.uk/software)• NaCTeM (http://nactem.ac.uk/)
15 / 27
![Page 47: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/47.jpg)
Populating the Semantic Web txt2rdf
Available NLP systems
• OpenCalais: http://viewer.opencalais.com/
• Powerset: now part of Microsoft’s Bing
• AlchemyAPI: http://www.alchemyapi.com/api/demo.html
• Trained on lots of (English) text, mostly news articles• Many general-purpose NLP tools available online, eg
• nltk (http://www.nltk.org/)• Edinburgh LTG software (http://www.ltg.ed.ac.uk/software)• NaCTeM (http://nactem.ac.uk/)
15 / 27
![Page 48: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/48.jpg)
Organising the Triples schema design
Populating the Semantic Webwhy and how?text to RDF
Organising the Triplesschema designopen issues and problems
References16 / 27
![Page 49: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/49.jpg)
Organising the Triples schema design
Do we have a data schema?
create convert
structured data
How do we populate the Semantic Web?
unstructured
and semi−structured
16 / 27
![Page 50: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/50.jpg)
Organising the Triples schema design
A schema for free text
• So far we’ve looked at instance data (ABox statements)• Where is framework to slot instances into?• Named Entity categories: PERSON, PLACE, DATE, etc• NE categories become RDFS classes:
Entities are typed by their NE class
:peterMoar rdf:type :person .:found rdf:type :event .:polishedStoneKnives rdf:type :artefact .:may1946 rdf:type :date .
17 / 27
![Page 51: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/51.jpg)
Organising the Triples schema design
A schema for free text
• So far we’ve looked at instance data (ABox statements)• Where is framework to slot instances into?• Named Entity categories: PERSON, PLACE, DATE, etc• NE categories become RDFS classes:
Entities are typed by their NE class
:peterMoar rdf:type :person .:found rdf:type :event .:polishedStoneKnives rdf:type :artefact .:may1946 rdf:type :date .
17 / 27
![Page 52: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/52.jpg)
Organising the Triples schema design
A schema for free text
• So far we’ve looked at instance data (ABox statements)• Where is framework to slot instances into?• Named Entity categories: PERSON, PLACE, DATE, etc• NE categories become RDFS classes:
Entities are typed by their NE class
:peterMoar rdf:type :person .:found rdf:type :event .:polishedStoneKnives rdf:type :artefact .:may1946 rdf:type :date .
17 / 27
![Page 53: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/53.jpg)
Organising the Triples schema design
A schema for free text
• So far we’ve looked at instance data (ABox statements)• Where is framework to slot instances into?• Named Entity categories: PERSON, PLACE, DATE, etc• NE categories become RDFS classes:
Entities are typed by their NE class
:peterMoar rdf:type :person .:found rdf:type :event .:polishedStoneKnives rdf:type :artefact .:may1946 rdf:type :date .
17 / 27
![Page 54: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/54.jpg)
Organising the Triples schema design
A schema for free text
• So far we’ve looked at instance data (ABox statements)• Where is framework to slot instances into?• Named Entity categories: PERSON, PLACE, DATE, etc• NE categories become RDFS classes:
Mr. Peter Moar found a small hoard of 5 polished stone knives on the
same patch in May 1946.
Entities are typed by their NE class
:peterMoar rdf:type :person .:found rdf:type :event .:polishedStoneKnives rdf:type :artefact .:may1946 rdf:type :date .
17 / 27
![Page 55: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/55.jpg)
Organising the Triples schema design
RDFS class hierarchy
• NE categories usually flat list, not hierarchical• My set: ORG, PERSNAME, ROLE, SITETYPE, ARTEFACT,
PLACE, SITENAME, ADDRESS, PERIOD, DATE, EVENT• Contrast with structured data• Hand-designed hierarchy (with rdfs:subClassOf ):
18 / 27
![Page 56: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/56.jpg)
Organising the Triples schema design
RDFS class hierarchy
• NE categories usually flat list, not hierarchical• My set: ORG, PERSNAME, ROLE, SITETYPE, ARTEFACT,
PLACE, SITENAME, ADDRESS, PERIOD, DATE, EVENT• Contrast with structured data• Hand-designed hierarchy (with rdfs:subClassOf ):
18 / 27
![Page 57: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/57.jpg)
Organising the Triples schema design
RDFS class hierarchy
• NE categories usually flat list, not hierarchical• My set: ORG, PERSNAME, ROLE, SITETYPE, ARTEFACT,
PLACE, SITENAME, ADDRESS, PERIOD, DATE, EVENT• Contrast with structured data• Hand-designed hierarchy (with rdfs:subClassOf ):
18 / 27
![Page 58: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/58.jpg)
Organising the Triples schema design
RDFS class hierarchy
• NE categories usually flat list, not hierarchical• My set: ORG, PERSNAME, ROLE, SITETYPE, ARTEFACT,
PLACE, SITENAME, ADDRESS, PERIOD, DATE, EVENT• Contrast with structured data• Hand-designed hierarchy (with rdfs:subClassOf ):
18 / 27
![Page 59: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/59.jpg)
Organising the Triples schema design
RDFS class hierarchy
• NE categories usually flat list, not hierarchical• My set: ORG, PERSNAME, ROLE, SITETYPE, ARTEFACT,
PLACE, SITENAME, ADDRESS, PERIOD, DATE, EVENT• Contrast with structured data• Hand-designed hierarchy (with rdfs:subClassOf ):
http://www.ltg.ed.ac.uk/tether/
timeagentsiteid
rolepersonorg date period
loc classn
sitetype objtype
event
addresssitenameplace
grid
survey excavation find visit description creation alteration
18 / 27
![Page 60: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/60.jpg)
Organising the Triples schema design
Property labels
• Can we extract RDF predicate set automatically from text?
• Eg clustering commonly occurring verbs
• But – schema design only has to be done once
• Property set and class hierarchy are related
• My set: hasEvent, hasAgent, hasAgentRole, hasPeriod,hasPatient, hasLocation, hasClassn, hasObject, partOf
• Plus “standard” ones like owl:sameAs, rdfs:seeAlso,skos:broader
• Flat list (no rdfs:subPropertyOf )
19 / 27
![Page 61: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/61.jpg)
Organising the Triples schema design
Property labels
• Can we extract RDF predicate set automatically from text?
• Eg clustering commonly occurring verbs
• But – schema design only has to be done once
• Property set and class hierarchy are related
• My set: hasEvent, hasAgent, hasAgentRole, hasPeriod,hasPatient, hasLocation, hasClassn, hasObject, partOf
• Plus “standard” ones like owl:sameAs, rdfs:seeAlso,skos:broader
• Flat list (no rdfs:subPropertyOf )
19 / 27
![Page 62: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/62.jpg)
Organising the Triples schema design
Property labels
• Can we extract RDF predicate set automatically from text?
• Eg clustering commonly occurring verbs
• But – schema design only has to be done once
• Property set and class hierarchy are related
• My set: hasEvent, hasAgent, hasAgentRole, hasPeriod,hasPatient, hasLocation, hasClassn, hasObject, partOf
• Plus “standard” ones like owl:sameAs, rdfs:seeAlso,skos:broader
• Flat list (no rdfs:subPropertyOf )
19 / 27
![Page 63: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/63.jpg)
Organising the Triples schema design
Property labels
• Can we extract RDF predicate set automatically from text?
• Eg clustering commonly occurring verbs
• But – schema design only has to be done once
• Property set and class hierarchy are related
• My set: hasEvent, hasAgent, hasAgentRole, hasPeriod,hasPatient, hasLocation, hasClassn, hasObject, partOf
• Plus “standard” ones like owl:sameAs, rdfs:seeAlso,skos:broader
• Flat list (no rdfs:subPropertyOf )
19 / 27
![Page 64: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/64.jpg)
Organising the Triples schema design
Property labels
• Can we extract RDF predicate set automatically from text?
• Eg clustering commonly occurring verbs
• But – schema design only has to be done once
• Property set and class hierarchy are related
• My set: hasEvent, hasAgent, hasAgentRole, hasPeriod,hasPatient, hasLocation, hasClassn, hasObject, partOf
• Plus “standard” ones like owl:sameAs, rdfs:seeAlso,skos:broader
• Flat list (no rdfs:subPropertyOf )
19 / 27
![Page 65: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/65.jpg)
Organising the Triples schema design
Property labels
• Can we extract RDF predicate set automatically from text?
• Eg clustering commonly occurring verbs
• But – schema design only has to be done once
• Property set and class hierarchy are related
• My set: hasEvent, hasAgent, hasAgentRole, hasPeriod,hasPatient, hasLocation, hasClassn, hasObject, partOf
• Plus “standard” ones like owl:sameAs, rdfs:seeAlso,skos:broader
• Flat list (no rdfs:subPropertyOf )
19 / 27
![Page 66: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/66.jpg)
Organising the Triples schema design
Property labels
• Can we extract RDF predicate set automatically from text?
• Eg clustering commonly occurring verbs
• But – schema design only has to be done once
• Property set and class hierarchy are related
• My set: hasEvent, hasAgent, hasAgentRole, hasPeriod,hasPatient, hasLocation, hasClassn, hasObject, partOf
• Plus “standard” ones like owl:sameAs, rdfs:seeAlso,skos:broader
• Flat list (no rdfs:subPropertyOf )
19 / 27
![Page 67: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/67.jpg)
Organising the Triples schema design
Existing schemas and vocabularies
• Lots of vocabularies to choose from• http://esw.w3.org/topic/VocabularyMarket
• RDFS and OWL are fundamental• SKOS – translate existing domain thesauri
• http://www.w3.org/2004/02/skos/
• Reuse where possible; invent local scheme where not
• Incentive to reuse is strong
20 / 27
![Page 68: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/68.jpg)
Organising the Triples schema design
Existing schemas and vocabularies
• Lots of vocabularies to choose from• http://esw.w3.org/topic/VocabularyMarket
• RDFS and OWL are fundamental• SKOS – translate existing domain thesauri
• http://www.w3.org/2004/02/skos/
• Reuse where possible; invent local scheme where not
• Incentive to reuse is strong
20 / 27
![Page 69: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/69.jpg)
Organising the Triples schema design
Existing schemas and vocabularies
• Lots of vocabularies to choose from• http://esw.w3.org/topic/VocabularyMarket
• RDFS and OWL are fundamental• SKOS – translate existing domain thesauri
• http://www.w3.org/2004/02/skos/
• Reuse where possible; invent local scheme where not
• Incentive to reuse is strong
20 / 27
![Page 70: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/70.jpg)
Organising the Triples schema design
Existing schemas and vocabularies
• Lots of vocabularies to choose from• http://esw.w3.org/topic/VocabularyMarket
• RDFS and OWL are fundamental• SKOS – translate existing domain thesauri
• http://www.w3.org/2004/02/skos/
• Reuse where possible; invent local scheme where not
• Incentive to reuse is strong
20 / 27
![Page 71: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/71.jpg)
Organising the Triples schema design
Existing schemas and vocabularies
• Lots of vocabularies to choose from• http://esw.w3.org/topic/VocabularyMarket
• RDFS and OWL are fundamental• SKOS – translate existing domain thesauri
• http://www.w3.org/2004/02/skos/
• Reuse where possible; invent local scheme where not
• Incentive to reuse is strong
20 / 27
![Page 72: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/72.jpg)
Organising the Triples schema design
Existing schemas and vocabularies
• Lots of vocabularies to choose from• http://esw.w3.org/topic/VocabularyMarket
• RDFS and OWL are fundamental• SKOS – translate existing domain thesauri
• http://www.w3.org/2004/02/skos/
• Reuse where possible; invent local scheme where not
• Incentive to reuse is strong
20 / 27
![Page 73: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/73.jpg)
Organising the Triples schema design
Existing schemas and vocabularies
• Lots of vocabularies to choose from• http://esw.w3.org/topic/VocabularyMarket
• RDFS and OWL are fundamental• SKOS – translate existing domain thesauri
• http://www.w3.org/2004/02/skos/
• Reuse where possible; invent local scheme where not
• Incentive to reuse is strong
20 / 27
![Page 74: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/74.jpg)
Organising the Triples open issues and problems
Issues around schema design
• Schema granularity• more categories to choose from⇒ NLP less accurate• so class hierarchy quite coarse and flat
• Schema discovery• tension between simplicity and query power• how will software agents “understand” your schema?
21 / 27
![Page 75: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/75.jpg)
Organising the Triples open issues and problems
Issues around schema design
• Schema granularity• more categories to choose from⇒ NLP less accurate• so class hierarchy quite coarse and flat
• Schema discovery• tension between simplicity and query power• how will software agents “understand” your schema?
21 / 27
![Page 76: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/76.jpg)
Organising the Triples open issues and problems
Issues around schema design
• Schema granularity• more categories to choose from⇒ NLP less accurate• so class hierarchy quite coarse and flat
• Schema discovery• tension between simplicity and query power• how will software agents “understand” your schema?
21 / 27
![Page 77: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/77.jpg)
Organising the Triples open issues and problems
Issues around schema design
• Schema granularity• more categories to choose from⇒ NLP less accurate• so class hierarchy quite coarse and flat
• Schema discovery• tension between simplicity and query power• how will software agents “understand” your schema?
21 / 27
![Page 78: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/78.jpg)
Organising the Triples open issues and problems
Issues around schema design
• Schema granularity• more categories to choose from⇒ NLP less accurate• so class hierarchy quite coarse and flat
• Schema discovery• tension between simplicity and query power• how will software agents “understand” your schema?
21 / 27
![Page 79: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/79.jpg)
Organising the Triples open issues and problems
Issues around schema design
• Schema granularity• more categories to choose from⇒ NLP less accurate• so class hierarchy quite coarse and flat
• Schema discovery• tension between simplicity and query power• how will software agents “understand” your schema?
21 / 27
![Page 80: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/80.jpg)
Organising the Triples open issues and problems
Issues around NLP accuracy
• Disambiguation of extracted NEs (Mr Peter Moar, P Moar, etc)• NLP is not 100% accurate!
• how many false statements can Semantic Web stand?• “the computer said it, so it must be true”
• Even given canonical URI :peterMoar for our Peter Moar...
• ... can we distinguish him from others with same name?
22 / 27
![Page 81: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/81.jpg)
Organising the Triples open issues and problems
Issues around NLP accuracy
• Disambiguation of extracted NEs (Mr Peter Moar, P Moar, etc)• NLP is not 100% accurate!
• how many false statements can Semantic Web stand?• “the computer said it, so it must be true”
• Even given canonical URI :peterMoar for our Peter Moar...
• ... can we distinguish him from others with same name?
22 / 27
![Page 82: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/82.jpg)
Organising the Triples open issues and problems
Issues around NLP accuracy
• Disambiguation of extracted NEs (Mr Peter Moar, P Moar, etc)• NLP is not 100% accurate!
• how many false statements can Semantic Web stand?• “the computer said it, so it must be true”
• Even given canonical URI :peterMoar for our Peter Moar...
• ... can we distinguish him from others with same name?
22 / 27
![Page 83: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/83.jpg)
Organising the Triples open issues and problems
Issues around NLP accuracy
• Disambiguation of extracted NEs (Mr Peter Moar, P Moar, etc)• NLP is not 100% accurate!
• how many false statements can Semantic Web stand?• “the computer said it, so it must be true”
• Even given canonical URI :peterMoar for our Peter Moar...
• ... can we distinguish him from others with same name?
22 / 27
![Page 84: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/84.jpg)
Organising the Triples open issues and problems
Issues around NLP accuracy
• Disambiguation of extracted NEs (Mr Peter Moar, P Moar, etc)• NLP is not 100% accurate!
• how many false statements can Semantic Web stand?• “the computer said it, so it must be true”
• Even given canonical URI :peterMoar for our Peter Moar...
• ... can we distinguish him from others with same name?
22 / 27
![Page 85: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/85.jpg)
Organising the Triples open issues and problems
Issues around NLP accuracy
• Disambiguation of extracted NEs (Mr Peter Moar, P Moar, etc)• NLP is not 100% accurate!
• how many false statements can Semantic Web stand?• “the computer said it, so it must be true”
• Even given canonical URI :peterMoar for our Peter Moar...
• ... can we distinguish him from others with same name?
22 / 27
![Page 86: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/86.jpg)
Organising the Triples open issues and problems
Grounding local data
• Grounding: tie unique canonical URIs to authoritative reference
• Is http://www.ltg.ed.ac.uk/tether/Loc/Place#edinburgh same ashttp://www.geonames.org/2650225/edinburgh.html?
• Grounding during NLP processing impractical
• Use owl:sameAs? – redundant triples, more complex queries
• Who maintains the authoritative lists of URIs?
• Specialist thesauri, eg archaeological classification terms like“Stone setting”
23 / 27
![Page 87: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/87.jpg)
Organising the Triples open issues and problems
Grounding local data
• Grounding: tie unique canonical URIs to authoritative reference
• Is http://www.ltg.ed.ac.uk/tether/Loc/Place#edinburgh same ashttp://www.geonames.org/2650225/edinburgh.html?
• Grounding during NLP processing impractical
• Use owl:sameAs? – redundant triples, more complex queries
• Who maintains the authoritative lists of URIs?
• Specialist thesauri, eg archaeological classification terms like“Stone setting”
23 / 27
![Page 88: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/88.jpg)
Organising the Triples open issues and problems
Grounding local data
• Grounding: tie unique canonical URIs to authoritative reference
• Is http://www.ltg.ed.ac.uk/tether/Loc/Place#edinburgh same ashttp://www.geonames.org/2650225/edinburgh.html?
• Grounding during NLP processing impractical
• Use owl:sameAs? – redundant triples, more complex queries
• Who maintains the authoritative lists of URIs?
• Specialist thesauri, eg archaeological classification terms like“Stone setting”
23 / 27
![Page 89: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/89.jpg)
Organising the Triples open issues and problems
Grounding local data
• Grounding: tie unique canonical URIs to authoritative reference
• Is http://www.ltg.ed.ac.uk/tether/Loc/Place#edinburgh same ashttp://www.geonames.org/2650225/edinburgh.html?
• Grounding during NLP processing impractical
• Use owl:sameAs? – redundant triples, more complex queries
• Who maintains the authoritative lists of URIs?
• Specialist thesauri, eg archaeological classification terms like“Stone setting”
23 / 27
![Page 90: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/90.jpg)
Organising the Triples open issues and problems
Grounding local data
• Grounding: tie unique canonical URIs to authoritative reference
• Is http://www.ltg.ed.ac.uk/tether/Loc/Place#edinburgh same ashttp://www.geonames.org/2650225/edinburgh.html?
• Grounding during NLP processing impractical
• Use owl:sameAs? – redundant triples, more complex queries
• Who maintains the authoritative lists of URIs?
• Specialist thesauri, eg archaeological classification terms like“Stone setting”
23 / 27
![Page 91: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/91.jpg)
Organising the Triples open issues and problems
Grounding local data
• Grounding: tie unique canonical URIs to authoritative reference
• Is http://www.ltg.ed.ac.uk/tether/Loc/Place#edinburgh same ashttp://www.geonames.org/2650225/edinburgh.html?
• Grounding during NLP processing impractical
• Use owl:sameAs? – redundant triples, more complex queries
• Who maintains the authoritative lists of URIs?
• Specialist thesauri, eg archaeological classification terms like“Stone setting”
23 / 27
![Page 92: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/92.jpg)
Organising the Triples open issues and problems
Grounding specialist terminology
siteid:site20
address:breckon
sitename:sands+of+breckon
date:1982
event:excavation
event:excavation20w158
address:hp50nw+11.01+hp+5304+0519
sitetype:stone+settings20w179
sitetype:stone+setting
"stone setting"
"An arrangement of twoor more standing stones"
sitetype:religious+ritual+and+funerary
sitetype:standing+stone
sitetype:stone+circle
sitetype:stone+row
sitetype:
:hasLocation
:hasLocation
:hasPeriod
rdf:type
:hasEvent
:hasLocation
:hasClassn
rdf:type
rdfs:label
skos:scopeNote
skos:broader
skos:related
rdfs:subClassOf
Aim: ground place names, people, organisations, etc.24 / 27
![Page 93: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/93.jpg)
Organising the Triples open issues and problems
Grounding specialist terminology
siteid:site20
address:breckon
sitename:sands+of+breckon
date:1982
event:excavation
event:excavation20w158
address:hp50nw+11.01+hp+5304+0519
"An arrangement of twoor more standing stones"
sitetype:religious+ritual+and+funerary
sitetype:standing+stone
sitetype:stone+circle
sitetype:stone+row
sitetype:stone+settings20w179
sitetype:
"stone setting"
sitetype:stone+setting
:hasClassn
rdfs:label
skos:scopeNote
skos:broader
skos:related
rdf:type
rdfs:subClassOf
:hasLocation
:hasLocation
:hasPeriod
rdf:type
:hasEvent
:hasLocation
Aim: ground place names, people, organisations, etc.24 / 27
![Page 94: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/94.jpg)
Organising the Triples open issues and problems
Grounding specialist terminology
siteid:site20
address:breckon
sitename:sands+of+breckon
date:1982
event:excavation
event:excavation20w158
address:hp50nw+11.01+hp+5304+0519
"An arrangement of twoor more standing stones"
sitetype:religious+ritual+and+funerary
sitetype:standing+stone
sitetype:stone+circle
sitetype:stone+row
sitetype:stone+settings20w179
sitetype:
"stone setting"
sitetype:stone+setting
:hasClassn
rdfs:label
skos:scopeNote
skos:broader
skos:related
rdf:type
rdfs:subClassOf
:hasLocation
:hasLocation
:hasPeriod
rdf:type
:hasEvent
:hasLocation
Aim: ground place names, people, organisations, etc.24 / 27
![Page 95: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/95.jpg)
Organising the Triples open issues and problems
Linked Data
25 / 27
![Page 96: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/96.jpg)
Organising the Triples open issues and problems
Grounding turns data into Linked Data
26 / 27
![Page 97: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/97.jpg)
Organising the Triples open issues and problems
Grounding turns data into Linked Data
26 / 27
![Page 98: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/98.jpg)
Organising the Triples open issues and problems
Grounding turns data into Linked Data
26 / 27
![Page 99: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/99.jpg)
Organising the Triples open issues and problems
Grounding turns data into Linked Data
grounding local URIs
against "authority" nodes
is the
next big challenge!
26 / 27
![Page 100: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/100.jpg)
References
References to follow up
• NLP software for Semantic Web applications• OpenCalais: http://opencalais.com/• AlchemyAPI: http://www.alchemyapi.com/api/demo.html
• VocabularyMarket: http://esw.w3.org/topic/VocabularyMarket• Case study
• Byrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text. CAA2009, Williamsburg, VA.
• General references for NLP• Natural Language Processing with Python, Bird, Klein and Loper
(http://www.nltk.org/book)• Speech and Language Processing, Jurafsky and Martin
(http://www.cs.colorado.edu/∼martin/slp.html)
27 / 27
![Page 101: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/101.jpg)
References
References to follow up
• NLP software for Semantic Web applications• OpenCalais: http://opencalais.com/• AlchemyAPI: http://www.alchemyapi.com/api/demo.html
• VocabularyMarket: http://esw.w3.org/topic/VocabularyMarket• Case study
• Byrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text. CAA2009, Williamsburg, VA.
• General references for NLP• Natural Language Processing with Python, Bird, Klein and Loper
(http://www.nltk.org/book)• Speech and Language Processing, Jurafsky and Martin
(http://www.cs.colorado.edu/∼martin/slp.html)
27 / 27
![Page 102: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/102.jpg)
References
References to follow up
• NLP software for Semantic Web applications• OpenCalais: http://opencalais.com/• AlchemyAPI: http://www.alchemyapi.com/api/demo.html
• VocabularyMarket: http://esw.w3.org/topic/VocabularyMarket• Case study
• Byrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text. CAA2009, Williamsburg, VA.
• General references for NLP• Natural Language Processing with Python, Bird, Klein and Loper
(http://www.nltk.org/book)• Speech and Language Processing, Jurafsky and Martin
(http://www.cs.colorado.edu/∼martin/slp.html)
27 / 27
![Page 103: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/103.jpg)
References
References to follow up
• NLP software for Semantic Web applications• OpenCalais: http://opencalais.com/• AlchemyAPI: http://www.alchemyapi.com/api/demo.html
• VocabularyMarket: http://esw.w3.org/topic/VocabularyMarket• Case study
• Byrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text. CAA2009, Williamsburg, VA.
• General references for NLP• Natural Language Processing with Python, Bird, Klein and Loper
(http://www.nltk.org/book)• Speech and Language Processing, Jurafsky and Martin
(http://www.cs.colorado.edu/∼martin/slp.html)
27 / 27
![Page 104: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/104.jpg)
References
References to follow up
• NLP software for Semantic Web applications• OpenCalais: http://opencalais.com/• AlchemyAPI: http://www.alchemyapi.com/api/demo.html
• VocabularyMarket: http://esw.w3.org/topic/VocabularyMarket• Case study
• Byrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text. CAA2009, Williamsburg, VA.
• General references for NLP• Natural Language Processing with Python, Bird, Klein and Loper
(http://www.nltk.org/book)• Speech and Language Processing, Jurafsky and Martin
(http://www.cs.colorado.edu/∼martin/slp.html)
27 / 27
![Page 105: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/105.jpg)
References
References to follow up
• NLP software for Semantic Web applications• OpenCalais: http://opencalais.com/• AlchemyAPI: http://www.alchemyapi.com/api/demo.html
• VocabularyMarket: http://esw.w3.org/topic/VocabularyMarket• Case study
• Byrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text. CAA2009, Williamsburg, VA.
• General references for NLP• Natural Language Processing with Python, Bird, Klein and Loper
(http://www.nltk.org/book)• Speech and Language Processing, Jurafsky and Martin
(http://www.cs.colorado.edu/∼martin/slp.html)
27 / 27
![Page 106: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/106.jpg)
References
References to follow up
• NLP software for Semantic Web applications• OpenCalais: http://opencalais.com/• AlchemyAPI: http://www.alchemyapi.com/api/demo.html
• VocabularyMarket: http://esw.w3.org/topic/VocabularyMarket• Case study
• Byrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text. CAA2009, Williamsburg, VA.
• General references for NLP• Natural Language Processing with Python, Bird, Klein and Loper
(http://www.nltk.org/book)• Speech and Language Processing, Jurafsky and Martin
(http://www.cs.colorado.edu/∼martin/slp.html)
27 / 27
![Page 107: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/107.jpg)
References
References to follow up
• NLP software for Semantic Web applications• OpenCalais: http://opencalais.com/• AlchemyAPI: http://www.alchemyapi.com/api/demo.html
• VocabularyMarket: http://esw.w3.org/topic/VocabularyMarket• Case study
• Byrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text. CAA2009, Williamsburg, VA.
• General references for NLP• Natural Language Processing with Python, Bird, Klein and Loper
(http://www.nltk.org/book)• Speech and Language Processing, Jurafsky and Martin
(http://www.cs.colorado.edu/∼martin/slp.html)
27 / 27
![Page 108: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is](https://reader033.vdocuments.site/reader033/viewer/2022053023/605650dc95c47c43d40409fc/html5/thumbnails/108.jpg)
References
References to follow up
• NLP software for Semantic Web applications• OpenCalais: http://opencalais.com/• AlchemyAPI: http://www.alchemyapi.com/api/demo.html
• VocabularyMarket: http://esw.w3.org/topic/VocabularyMarket• Case study
• Byrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text. CAA2009, Williamsburg, VA.
• General references for NLP• Natural Language Processing with Python, Bird, Klein and Loper
(http://www.nltk.org/book)• Speech and Language Processing, Jurafsky and Martin
(http://www.cs.colorado.edu/∼martin/slp.html)
27 / 27