it’s not all semantic: semantic data; what it is, what it ... · information that should the...

6
IT’S NOT ALL SEMANTIC: SEMANTIC DATA;WHAT IT IS,WHAT IT ISN’T, AND WHAT YOU NEED TO KNOW. ORANGEPAPER

Upload: others

Post on 10-Mar-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IT’S NOT ALL SEMANTIC: SEMANTIC DATA; WHAT IT IS, WHAT IT ... · information that should the topic come up, you’ll be able to contribute. You may also discover that Linked Data

IT’S NOT ALL SEMANTIC: SEMANTIC DATA; WHAT IT IS, WHAT IT ISN’T, AND WHAT YOU NEED TO KNOW.

ORANGEPAPER

Page 2: IT’S NOT ALL SEMANTIC: SEMANTIC DATA; WHAT IT IS, WHAT IT ... · information that should the topic come up, you’ll be able to contribute. You may also discover that Linked Data

WHAT IS LINKED DATA?Linked Data refers to the concept of a common practice for exposing, sharing, and connecting information and data on the web. This approach of using open technical standards to link concepts is referred to as the Semantic Web, or Web 3.0, and it is the next logical evolution from using hyperlinks.

Linked Data is predicated on the agreement that everyone will use open semantic standards to participate in a collective “web of data” that can then be harnessed and grouped in intelligent ways. In fact, the more people use it, the smarter the data becomes as more context and information begins to surround the data. Context is where we glean real meaning so the more information we have clustered or grouped together, the better a computer can make sense of it. The faster a computer can make sense of it, the faster it can parse and deliver it so we can obtain the information we seek.

BETTA METAHow is Linked Data different from metadata? Linked Data is also metadata, but better. Metadata helps describe the data, for example it helps us answer the question “Is the number 12 a grade level, the amount of something, or a floor in a building?” Metadata tells us what the data represents. Linked Data works in the same way but it has the added ability to identify itself and its

INFORMIn 2014, the World Wide Web celebrated its 25th anniversary. While it sometimes feels as though we have always exchanged information on the internet through a myriad of portable screens, we are in actuality the first web generation. The Semantic Web, or Web 3.0, is already 10 years old but still young enough to hold the theoretical promise that our collective intelligence will make information more relevant and pertinent. Evidence is beginning to prove this true as more and more open data is released to the web and data is combined in ways never before possible.

The Semantic Web is the extension of the web through open technical standards developed by the World Wide Web Consortium, or the W3C. Tim Berners-Lee founded and Directs the W3C, as well as having invented the World Wide Web in 1989. Linked Data is a standardized semantic technical format that enables data to be inter-relatable so it can group itself with other relevant concepts. By explicitly describing the data in a series of statements, search engines can traverse oceans of data without imposing a prescribed navigation. This approach is particularly powerful with large data sets, but it is not strictly limited to web data. Large global organizations and businesses are also using Linked Data methods for managing and discovering their own pool of enterprise data.

Standard web protocol language is not a topic of light conversation around the water cooler. This paper attempts to provide a primer for Linked Data with enough information that should the topic come up, you’ll be able to contribute. You may also discover that Linked Data might be the first step in your big data strategy for integrating information from disparate sources.

IT’S NOT ALL SEMANTIC: SEMANTIC DATA; WHAT IT IS, WHAT IT ISN’T, AND WHAT YOU NEED TO KNOW.

Page 3: IT’S NOT ALL SEMANTIC: SEMANTIC DATA; WHAT IT IS, WHAT IT ... · information that should the topic come up, you’ll be able to contribute. You may also discover that Linked Data

relationship to other content, or data. While metadata is traditionally used in a field table in a database, Linked Data is a mark-up language of information that travels with the content, asset, or product through an identifier which eventually creates a graph as it interacts with other data. With this added behavior, Linked Data is much more than search optimization. It brings intelligence and enrichment to information by being machine readable so a computer can disambiguate between “Paris” the capital of France from “Paris” the Hilton heiress, without the use of a taxonomy.

When attempting to integrate multiple data-sets, a Linked Data approach eases the burden by standardizing the data through an intermediary, a kind of metadata layer that can be queried in what is called a Triple Store. This is a kind of database that only stores data. It is termed a Triple Store because the semantic standard used in Linked Data is expressed in a set of three statements. Because triples, in and of itself, are a kind of relationship, it is more sophisticated than a simple metadata tag.

EVOLVING STANDARDS AND THE RISE OF THE CONSUMERIn my last job, one of my bosses looked at me askance and said “Standards are firm and don’t change. That’s why they’re called standards!” He couldn’t have been more wrong. In the world of technology, nothing stays the same, and that includes standards. Below is a quick timeline of the web protocol standards over the years.

As we’ve evolved from a “web of documents” to a “web of data” the standards have had to reflect those changes and each standard was built on the foundation of the standard preceding it.

You may wonder, as I did, why there appears to be a rather large gap between 2003 and 2011? Why was adoption of these standards so slow? My conjecture is that businesses were not adopting “open” standards but using similar technical solutions to build their data businesses on. Google’s search algorithms were considered top secret, as were Amazon’s recommendation engines, Apple’s music strategy, and Facebook’s data. Once

MCF using XML (Netscape) RDF, CDF

• Microformats• SPARQL, Turtle, N3, • GRDDL, R2RML, FOAF,

SIOC, SKOS

• RDF• RDFS

• schema.org launched by Microsoft/Google/Yahoo

• Recognises RDFa

schema.org extensions: • Learning Resource

Metadata Initiative • SchemaBibEx

Meta Content Framework (MCF) (Apple)

‘96 ‘97 ‘99

‘14‘11‘03‘01• W3C• DAML, OWL, OWL EL

OWL QL, OWL RL

Page 4: IT’S NOT ALL SEMANTIC: SEMANTIC DATA; WHAT IT IS, WHAT IT ... · information that should the topic come up, you’ll be able to contribute. You may also discover that Linked Data

but the term cannot tell the machine or computer, what it is, i.e. a grade level, a currency, or a floor in a tall building.

Inserting a URL in a metadata field does not make your data “linked.” Using hyperlinks or referencing a website is not semantically linking data. Linked Data is the use of technical standards to express information.

TRANSORM

WHAT MAKES LINKED DATA: • Linked Data requires the use of open

semantic standards such as schema.org, RDF, SKOS.OWL, SPARQL, etc.

• It must use URIs (Unique Resource Identifiers) to name things

• The URIs must also include links to other URIs

• Linked Data is a non-proprietary format

STATEMENTS IN FORMLinked Data is a method of publishing structured data so that a human, and a machine, can better understand its meaning. RDF, which stands for Resource Description Framework, is the semantic standard for expressing this structured data. It requires data to be expressed as statements in three ways: 1) subject, 2) predicate, 3) object. As mentioned earlier, this is called a “triple”. This allows a myriad of combinations to be expressed and shared. For example, we can state the following:

these businesses demonstrated the power of data technology and the use of information as their core businesses driver, people began to pay attention. By 2011, a majority of businesses had an e-commerce strategy and developers began using open semantic standards to build platforms that were being accessed directly by customers. This incentivized rapid adoption. The promise of revenue elevated the ideology of the Semantic Web from an academic theory to a practical business imperative.

In the first three years of the release of schema.org 1.0 (a Linked Data mark-up) in 2011, webpages with semantic extensions increased by 1000% according to Ramanathan V. Guha, the Google Fellow and founder of schema.org. In fact, Google is already using Linked Data behind their new search results, though you may not be aware of it. If you type in the name of an entity in the Google search window, “Tim Berners-Lee”, for example, you’ll get two sets of results. On the left side of your screen you’ll recognize the familiar list of hyperlinks to websites. Regardless of how Google ranks the results, the long authority list still requires human scrolling and reading in order to find the information most pertinent to your inquiry. On the right, you’ll notice an information box with links that are specific to Tim Berners-Lee: his birthday, his nationality, his marital status, etc. The information in the box are all semantically linked data and will soon replace the list on the left.

WHAT IT ISN’TPeople often misinterpret “semantic” to mean something that has recognizable meaning. They will tell me that they use semantic tags, by way of file headers, or a metadata schema with controlled vocabularies. While these are all good practices, they are not actually semantic. They are not semantic for the simple reason that they are not machine readable. A search engine may be able to determine the sequence of letters that equal a term,

Page 5: IT’S NOT ALL SEMANTIC: SEMANTIC DATA; WHAT IT IS, WHAT IT ... · information that should the topic come up, you’ll be able to contribute. You may also discover that Linked Data

a SPARQL (a recursive acronym for Protocol and RDF Query Language) standard which can read and display the data.

Once every entity becomes “linked”, it begins to build relationships with other data forming a graph which can be visualized and analysed. Less family tree, more like a giant shrub where you can decide to follow Alice’s relations or Bob’s through a navigational structure that is not prescribed. It’s a non-taxonomy because attempting to build a global hierarchy of every existing concept on the web is impossible. Given the volume of data on the web, there is no single source of truth, but many many truths.

Alice (subject).....has-mother (predicate) .....Susan (object)

Alice ..................has-father ............................. John

Alice ..................knows ....................................Bob

Bob .....................has-mother ..........................Martha

Bob ....................has-father .............................Dick

The graphic below illustrates what Linked Data looks like in the field, so to speak. By using RDF to express predicates, like has-mother, SKOS (Simple Knowledge Organization System) standard for vocabularies (mother, father, sister) , and the use of a URL as an identifier (URI), the data is prepared to be queried through

father mother

father mother

John Sue

Alice Bob 1994-08-07

Dick Martha

birthdate

Of course, not all data problems can be solved with Linked Data, and not all databases actually need to reference other databases. It is not a panacea for all enterprise data, but businesses and organizations are beginning to recognize the need to cross-reference and analyse multiple data sets in order to bring evidence into play in decision making. Linked Data offers the means of brokering information without disrupting core repositories of content, digital assets, HR data, product portfolios and finance systems, and the like.

OUTPERFORM

POTENTIAL FOR BUSINESSESThe same theory of integrating and referencing vast amounts of data on the web are equally applicable to most large enterprises. Breaking up siloed repositories and having a holistic view of the data through one interface is a powerful prospect for most businesses. It is more cost effective and efficient than migrating and converging data into one über database, but experience and knowledge in building platforms on semantic technology is still very limited in most IT offices.

Page 6: IT’S NOT ALL SEMANTIC: SEMANTIC DATA; WHAT IT IS, WHAT IT ... · information that should the topic come up, you’ll be able to contribute. You may also discover that Linked Data

ABOUT THE AUTHORMadi Weland SolomonMadi Weland Solomon is with the London office of Optimity Advisors as Senior Manager and brings over twenty years of knowledge and experience from a range of sectors. She is a creative technologist and specializes in business intelligence initiatives and semantic technology that bridge the technical within social and cultural constructs.

PARTNER CONTACTJohn [email protected] Madison Avenue, Suite 1205New York, NY 10016Main: 202.540.9222Fax: 202.540.9223www.OptimityAdvisors.com

CONCLUSIONWe are in an age of networked information over-taken by digital components that can be personalised and manipulated by consumers over a vast array of devices. Information management can no longer be dictated by manual efforts.

The future of any organization’s success is dependent on extending its knowledge beyond content or product distribution to include the listening post placed around it. Tracking and understanding consumer behaviour is part of the data evidence that will inform decision. Linked data can be used to build or trigger interventions, recommendations, and outcomes.

Linked Data may not solve all problems, but data and information are quickly becoming core business assets in and of themselves.

ABOUT OPTIMITY ADVISORSOptimity Advisors is a rapidly growing, multi-industry strategy, operations and information technology advisory firm with multiple locations throughout the United States, United Kingdom and Europe. We specialize in the critical set of services that sit between high-level strategy and delivery and execution. We provide a strategic outlook through proven methodology, knowledge and instinct, helping to craft an actionable future vision that aligns with your long-term goals and objectives. We bring an end-to-end industry understanding to help you rise above the day-to-day, focus on the opportunities ahead and align your organization for success.

Washington, DC | Brussels | London | Los Angeles | New York | Zurich

www.OptimityAdvisors.com