the preparation of information in data...
TRANSCRIPT
![Page 1: The Preparation of Information in Data Sciencehiperc.buffalostate.edu/daad4aabc/presentations/Rudnicki-Ron.pdf · The Role of Ontologies in Unlocking Big Data • Big Data holds the](https://reader034.vdocuments.site/reader034/viewer/2022050405/5f82ed06ea804d3c0e1eba55/html5/thumbnails/1.jpg)
The Preparation of Information in Data Science
![Page 2: The Preparation of Information in Data Sciencehiperc.buffalostate.edu/daad4aabc/presentations/Rudnicki-Ron.pdf · The Role of Ontologies in Unlocking Big Data • Big Data holds the](https://reader034.vdocuments.site/reader034/viewer/2022050405/5f82ed06ea804d3c0e1eba55/html5/thumbnails/2.jpg)
2
The Role of Ontologies in Unlocking Big Data
• Big Data holds the potential of revealing great insights from large diverse data sets if properly exploited with the right analytics
• To better realize this potential a shift needs to occur from representations of individual data sets to representations that enable interoperability across all data sets
![Page 3: The Preparation of Information in Data Sciencehiperc.buffalostate.edu/daad4aabc/presentations/Rudnicki-Ron.pdf · The Role of Ontologies in Unlocking Big Data • Big Data holds the](https://reader034.vdocuments.site/reader034/viewer/2022050405/5f82ed06ea804d3c0e1eba55/html5/thumbnails/3.jpg)
3
The Common Core Ontology Development Method
• Rule governed development of an extensible set of ontologies to which data from sub-domains can be aligned and linked together
• Combines principles from the Linked Open Data Initiative, Open Biological and Biomedical Ontologies (OBO) Foundry, and object-oriented programming
![Page 4: The Preparation of Information in Data Sciencehiperc.buffalostate.edu/daad4aabc/presentations/Rudnicki-Ron.pdf · The Role of Ontologies in Unlocking Big Data • Big Data holds the](https://reader034.vdocuments.site/reader034/viewer/2022050405/5f82ed06ea804d3c0e1eba55/html5/thumbnails/4.jpg)
4
Linked Open Data Initiative
• Began as a means for integrating data on the world wide web
• Based on a simple set of guiding principles* – Use Universal Resource Identifiers (URIs) as
names of things – Use HTTP URIs so that people can look up
those names – When someone looks up a URI provide useful
information – Include links to other URIs so they can discover
other things *TimBerners-Lee“LinkedOpenData”h:ps://www.w3.org/DesignIssues/LinkedData
![Page 5: The Preparation of Information in Data Sciencehiperc.buffalostate.edu/daad4aabc/presentations/Rudnicki-Ron.pdf · The Role of Ontologies in Unlocking Big Data • Big Data holds the](https://reader034.vdocuments.site/reader034/viewer/2022050405/5f82ed06ea804d3c0e1eba55/html5/thumbnails/5.jpg)
A Linked Open Data Success Story
DBPedia
5
• Pages accessed from web browsers that link data from Wikipedia
![Page 6: The Preparation of Information in Data Sciencehiperc.buffalostate.edu/daad4aabc/presentations/Rudnicki-Ron.pdf · The Role of Ontologies in Unlocking Big Data • Big Data holds the](https://reader034.vdocuments.site/reader034/viewer/2022050405/5f82ed06ea804d3c0e1eba55/html5/thumbnails/6.jpg)
6
Linked Open Data Issue - A Profusion of Ontologies
LinkingOpenDataclouddiagram2014,byMaxSchmachtenberg,ChrisPanBizer,AnjaJentzschandRichardCyganiak.h:p://lod-cloud.net/
![Page 7: The Preparation of Information in Data Sciencehiperc.buffalostate.edu/daad4aabc/presentations/Rudnicki-Ron.pdf · The Role of Ontologies in Unlocking Big Data • Big Data holds the](https://reader034.vdocuments.site/reader034/viewer/2022050405/5f82ed06ea804d3c0e1eba55/html5/thumbnails/7.jpg)
7
Effects of Profusion
• Costs increase – relative to the amount of duplicative effort – relative to the number of mappings – relative to the number of vernaculars
• Effectiveness decreases – Searches have low recall and precision – Re-use creates ambiguities
![Page 8: The Preparation of Information in Data Sciencehiperc.buffalostate.edu/daad4aabc/presentations/Rudnicki-Ron.pdf · The Role of Ontologies in Unlocking Big Data • Big Data holds the](https://reader034.vdocuments.site/reader034/viewer/2022050405/5f82ed06ea804d3c0e1eba55/html5/thumbnails/8.jpg)
8
OBO Foundry
• The Open Biological and Biomedical (OBO) Foundry is a collaborative group of organizations devoted to establishing best practices in ontology development – Leverages the lessons learned from over
$300M investment in ontology development
![Page 9: The Preparation of Information in Data Sciencehiperc.buffalostate.edu/daad4aabc/presentations/Rudnicki-Ron.pdf · The Role of Ontologies in Unlocking Big Data • Big Data holds the](https://reader034.vdocuments.site/reader034/viewer/2022050405/5f82ed06ea804d3c0e1eba55/html5/thumbnails/9.jpg)
9
An OBO Foundry Best Practice – Use a Common Upper Ontology
• Produces common patterns within ontologies – Reuse of mappings from the sources
• Easier to include new sources of data
– Enables reuse of queries and analytics • Structure of data stays constant • Easier to transition to new domains of interest
EnPty
OrganizaPon
Object
QualityofPhysicalArPfact
QualityofOrganizaPo
n
PhysicalArPfact
Quality
has_quality has_quality
bearer_of
![Page 10: The Preparation of Information in Data Sciencehiperc.buffalostate.edu/daad4aabc/presentations/Rudnicki-Ron.pdf · The Role of Ontologies in Unlocking Big Data • Big Data holds the](https://reader034.vdocuments.site/reader034/viewer/2022050405/5f82ed06ea804d3c0e1eba55/html5/thumbnails/10.jpg)
10
Basic Formal Ontology
• An upper ontology with not more than 40 class terms and 20 relationships
• Provides an extensible structure for the interrelationships between basic entities
• Used as the upper ontology in hundreds of ontologies, primarily in the biomedical domain
• Used by at least one hundred different project
![Page 11: The Preparation of Information in Data Sciencehiperc.buffalostate.edu/daad4aabc/presentations/Rudnicki-Ron.pdf · The Role of Ontologies in Unlocking Big Data • Big Data holds the](https://reader034.vdocuments.site/reader034/viewer/2022050405/5f82ed06ea804d3c0e1eba55/html5/thumbnails/11.jpg)
An OBO Foundry Best Practice - Truth as a Development Guideline
Strive towards creating a digital copy of the world
11
Reduces perspective from the ontology enabling links to many sources
Provides an objective means for settling disputes over terminology
Adds the constraint that every assertion within an ontology must be true
![Page 12: The Preparation of Information in Data Sciencehiperc.buffalostate.edu/daad4aabc/presentations/Rudnicki-Ron.pdf · The Role of Ontologies in Unlocking Big Data • Big Data holds the](https://reader034.vdocuments.site/reader034/viewer/2022050405/5f82ed06ea804d3c0e1eba55/html5/thumbnails/12.jpg)
OBO Foundry Issue - Ontologies with Too Wide a Scope
Good practice of reusing existing terminology
12
• But the Ontology of Biomedical Investigations (OBI) is not a logical choice for where the term “Organization” is maintained
![Page 13: The Preparation of Information in Data Sciencehiperc.buffalostate.edu/daad4aabc/presentations/Rudnicki-Ron.pdf · The Role of Ontologies in Unlocking Big Data • Big Data holds the](https://reader034.vdocuments.site/reader034/viewer/2022050405/5f82ed06ea804d3c0e1eba55/html5/thumbnails/13.jpg)
Object Oriented Programming - Modularity as a Development Guideline
One axis of modularity in the CCO is level of generality
13
Content and structure is inherited from higher levels
Upper Ontologies Describe the Structure
of the World
Mid-Level Ontologies Add General Content to
the Structure
Domain Level Ontologies
Add Content Relevant to a Community
Upper and mid-level ontologies are stable and of manageable scale
![Page 14: The Preparation of Information in Data Sciencehiperc.buffalostate.edu/daad4aabc/presentations/Rudnicki-Ron.pdf · The Role of Ontologies in Unlocking Big Data • Big Data holds the](https://reader034.vdocuments.site/reader034/viewer/2022050405/5f82ed06ea804d3c0e1eba55/html5/thumbnails/14.jpg)
14
Object Oriented Programming - Modularity as a Development Guideline
The second axis of modularity in the CCO is content
A:ribute
Process
SiteTemporalRegion
PhysicalObject
has
parPcipatesin
occursatoccurson
Site
containedin
![Page 15: The Preparation of Information in Data Sciencehiperc.buffalostate.edu/daad4aabc/presentations/Rudnicki-Ron.pdf · The Role of Ontologies in Unlocking Big Data • Big Data holds the](https://reader034.vdocuments.site/reader034/viewer/2022050405/5f82ed06ea804d3c0e1eba55/html5/thumbnails/15.jpg)
15
The Common Core Ontologies in Practice
• The Common Core Ontologies (CCO) are intended to serve as a vocabulary that can describe objects and processes that are common to many domains of interest
• The remaining objects and processes that are unique to particular domains of interest are described by ontologies that extend from the CCO in a repeatable, rule governed process
![Page 16: The Preparation of Information in Data Sciencehiperc.buffalostate.edu/daad4aabc/presentations/Rudnicki-Ron.pdf · The Role of Ontologies in Unlocking Big Data • Big Data holds the](https://reader034.vdocuments.site/reader034/viewer/2022050405/5f82ed06ea804d3c0e1eba55/html5/thumbnails/16.jpg)
16
The Common Core and Domain Ontologies
BasicFormalOntology(BFO)
ExtendedRelaPonOntology
TimeOntology
QualityOntology
InformaPonEnPty
Ontology
GeospaPalOntology
EventOntology ArPfact
OntologyAgent
Ontology
Affec%veState
Ontology
EthnicityOntology
Occupa%onOntology
HydrographicFeatureOntology
PhysiographicFeatureOntology
CurrencyUnit
OntologyUnitsofMeasureOntology
CurriculumOntology
Ci%zenshipOntology
UpperOntology:
CommonCoreOntology:
DomainOntology:
WatercraCOntology
SensorOntology
AgentInforma%onOntology
UnderseaWarfareOntology
SpaceObjectOntology
![Page 17: The Preparation of Information in Data Sciencehiperc.buffalostate.edu/daad4aabc/presentations/Rudnicki-Ron.pdf · The Role of Ontologies in Unlocking Big Data • Big Data holds the](https://reader034.vdocuments.site/reader034/viewer/2022050405/5f82ed06ea804d3c0e1eba55/html5/thumbnails/17.jpg)
17
The Benefits of the Common Core Ontology Development Process