© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Semantic Technology in the Real World Presented by: Stephen Buxton October 2014
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 2
Hierarchical Era For your application data! • Application- and
hardware-specific
We Are The New Generation Database
Relational Era “For all your structured data!” • Normalized, tabular
model • Application-
independent query • User control
Any Structure Era “For all your data!” • Schema-agnostic • Massive scale • Query and search • Analytics • Application services • Faster time-to-results
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 3
Harnessing Data & Reimagining Applications
Reduce Risk
Manage Compliance
Create New Value from Data
Optimize Operations
Lower TCO / Better IT Economics
Better Decision-making
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 4
The Only Enterprise NoSQL Database Search & Query
ACID Transactions
High Availability / Disaster Recovery
Replication
Government-grade Security
Scalability & Elasticity
On-premise or Cloud Deployment
Hadoop for Storage & Compute
Semantics
Faster Time-to-Results
SEARCH DATABASE
APPLICATION SERVICES
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 5
Agenda Semantic Technologies – a whirlwind tour
Triple Stores (Graph Databases) NLP – deriving structured from unstructured
Semantics in the context of a database You also need documents … and scalars, and geospatial, and bitemporal, and … … and an Enterprise database
Semantics use cases Semantics and search Semantics and data integration
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 6
SEMANTIC TECHNOLOGIES
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 7
Semantics Is: A New Way to Organize Data
Data is stored in Triples, expressed as: Subject : Predicate : Object John Smith : livesIn : London London : isIn : England
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 8
Semantics Is: A New Way to Organize Data
Data is stored in Triples, expressed as: Subject : Predicate : Object John Smith : livesIn : London London : isIn : England
Query with SPARQL gives us simple lookup .. and more! Find people who live in (a place that's in) England
"John Smith" "England" livesIn "London" isIn
livesIn
RDF Triples
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 9
Semantics Is: A New Way to Organize Data
Data is stored in Triples, expressed as: Subject : Predicate : Object John Smith : livesIn : London London : isIn : England
Query with SPARQL gives us simple lookup .. and more! Find people who live in (a place that's in) England
RDF Triples
John England London isIn
livesIn
livesIn
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 10
Triple Stores and Graph Databases Triple Store
Triples lookup + graph traversal
Standard data model, standard language SPARQL queries over RDF
Example: MarkLogic
Graph Database
Graph analytics (consider the whole graph)
"Show me the shortest [weighted] path between .."
"Show me the node with highest degree"
Proprietary data model, proprietary language
Example: Neo4j
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 11
Triple Stores and Graph Databases Triple Store
Triples lookup + graph traversal
Standard data model, standard language SPARQL queries over RDF
Example: MarkLogic
Property Graph Database
Graph analytics "Show me the shortest [weighted] path between .."
"Show me the node with highest degree"
Proprietary data model, proprietary language
Example: Neo4j
Graph Database
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 12
RDF – What is RDF? Resource Description Framework
W3C Spec with a defined vocabulary for representing facts/relationships http://www.w3.org/RDF/
Facts expressed as triples: (subject, predicate, object) Abstract data model facilitates data sharing/merging even if the underlying
representations are different example: Ingest RDBMS data into a triple store as RDF triples
example: Map entities using predicates such as sameAs, subClassOf
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 13
RDF – What is a triple? A single fact/relationship consisting of (subject, predicate, object)
Subject An IRI representing a resource. For example a person or a company.
Predicate An IRI representing a property or characteristic of the subject; or of the
relationship between the subject and the object.
Also known as an arc or edge.
Object An IRI or a typed literal.
Typed literal: xsd:double, xsd:string, xsd:date, …
IRI: may be the subject of other triples.
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 14
RDF – What is an IRI? Internationalized [Unique] Resource Identifier
A string used to uniquely identify a resource in the Universe. An IRI may contain characters from the Universal Character Set (Unicode/ISO 10646).
Allows for Chinese, Japanese Kanji, etc. characters IRI vs. URI
URIs (uniform resource identifiers) are limited to ASCII characters IRI/URI vs URL
URL is a uniform resource locator There’s an expectation that if you follow a URL, you’ll find something useful An IRI is an identifier – it may or may not also be a locator
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 15
RDF - Examples David Bowie London
birthPlace
London
latitude
51.5072
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 16
RDF - Examples
Example 1: Object is also a subject, therefore its reflected as IRI Subject <http://dbpedia.org/resource/David_Bowie>
Predicate <http://dbpedia.org/ontology/birthPlace>
Object <http://dbpedia.org/resource/London>
David Bowie London
birthPlace
Example 2: Object is a typed literal Subject <http://dbpedia.org/resource/London>
Predicate <http://w3.org/2003/01/geo/wgs84_pos#lat>
Object “51.5072”^^<http://www.w3.org/2001/XMLSchema#float>
London
latitude
51.5072
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 17
RDF – A Serialization Turtle = Terse RDF Triple Language (*.ttl)
Natural, easy to read <http://dbpedia.org/resource/David_Bowie> <http://dbpedia.org/ontology/birthPlace> <http://dbpedia.org/resource/London> . <http://dbpedia.org/resource/David_Bowie> <http://dbpedia.org/ontology/birthDate> "1947-01-08"^^<http://www.w3.org/2001/XMLSchema#date> .
Namespace prefixes for brevity Semicolon indicates repeating subject
@prefix db: <http://dbpedia.org/resource/> . @prefix onto: <http://dbpedia.org/ontology/> . @prefix xs: <http://www.w3.org/2001/XMLSchema> . db:David_Bowie onto:birthPlace db:London ; onto:birthDate "1947-01-08"^^xs:date .
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 18
SPARQL – What is SPARQL? SPARQL – the SPARQL Protocol and RDF Query Language
an RDF query language, that is, a query language for databases, able to retrieve and manipulate data stored in Resource Description Framework format (wikipedia)
Looks a lot like SQL Based on pattern matching 4 kinds of SPARQL queries
SELECT, CONSTRUCT, ASK, DESCRIBE + SPARQL Update (part of SPARQL 1.1)
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 19
SPARQL – Example PREFIX rnews: <http://iptc.org/std/rNews/2011-10-07#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
# Find all news item headlines newer than July 11 2013.
SELECT ?s ?headline ?date
WHERE {
?s a rnews:NewsItem ;
rnews:headline ?headline ;
rnews:datePublished ?date .
FILTER (?date > "2013-07-11T00:00:00"^^xsd:dateTime )
} ORDER BY DESC(?date)
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 20
SPARQL – Example PREFIX rnews: <http://iptc.org/std/rNews/2011-10-07#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
# Find all news item headlines newer than July 11 2013.
SELECT ?s ?headline ?date
WHERE {
?s a rnews:NewsItem ;
rnews:headline ?headline ;
rnews:datePublished ?date .
FILTER (?date > "2013-07-11T00:00:00"^^xsd:dateTime )
} ORDER BY DESC(?date)
Prefixes – makes for less typing (a bit like namespaces in XML)
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 21
SPARQL – Example PREFIX rnews: <http://iptc.org/std/rNews/2011-10-07#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
# Find all news item headlines newer than July 11 2013.
SELECT ?s ?headline ?date
WHERE {
?s a rnews:NewsItem ;
rnews:headline ?headline ;
rnews:datePublished ?date .
FILTER (?date > "2013-07-11T00:00:00"^^xsd:dateTime )
} ORDER BY DESC(?date)
Comments
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 22
SPARQL – Example PREFIX rnews: <http://iptc.org/std/rNews/2011-10-07#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
# Find all news item headlines newer than July 11 2013.
SELECT ?s ?headline ?date
WHERE {
?s a rnews:NewsItem ;
rnews:headline ?headline ;
rnews:datePublished ?date .
FILTER (?date > "2013-07-11T00:00:00"^^xsd:dateTime )
} ORDER BY DESC(?date)
Projection – variables are bound in the pattern match (or externally bound)
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 23
SPARQL – Example PREFIX rnews: <http://iptc.org/std/rNews/2011-10-07#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
# Find all news item headlines newer than July 11 2013.
SELECT ?s ?headline ?date
WHERE {
?s a rnews:NewsItem ;
rnews:headline ?headline ;
rnews:datePublished ?date .
FILTER (?date > "2013-07-11T00:00:00"^^xsd:dateTime )
} ORDER BY DESC(?date)
Selection – select triples matching these patterns
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 24
SPARQL – Example PREFIX rnews: <http://iptc.org/std/rNews/2011-10-07#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
# Find all news item headlines newer than July 11 2013.
SELECT ?s ?headline ?date
WHERE {
?s a rnews:NewsItem ;
rnews:headline ?headline ;
rnews:datePublished ?date .
FILTER (?date > "2013-07-11T00:00:00"^^xsd:dateTime )
} ORDER BY DESC(?date)
Filter – return only triples that match these conditions
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 25
SPARQL – Example PREFIX rnews: <http://iptc.org/std/rNews/2011-10-07#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
# Find all news item headlines newer than July 11 2013.
SELECT ?s ?headline ?date
WHERE {
?s a rnews:NewsItem ;
rnews:headline ?headline ;
rnews:datePublished ?date .
FILTER (?date > "2013-07-11T00:00:00"^^xsd:dateTime )
} ORDER BY DESC(?date)
Order by – order the results
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 26
SPARQL – Example PREFIX rnews: <http://iptc.org/std/rNews/2011-10-07#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
# Find all news item headlines newer than July 11 2013.
SELECT ?s ?headline ?date
WHERE {
?s a rnews:NewsItem ;
rnews:headline ?headline ;
rnews:datePublished ?date .
FILTER (?date > "2013-07-11T00:00:00"^^xsd:dateTime )
} ORDER BY DESC(?date)
Return the ?s, ?headline, and ?date where ?s is a news item AND ?s has the headline ?headline AND ?s was published on ?date but only return results where ?date is after July 11 2013 Order the results by date descending
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 27
Inference – What is inference? We infer new facts/relationships based on:
Facts/relationships in the database Rules that we "know"
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 28
Inference – Example[1] We know:
prod001 is a Henley prod001 is blue Henley is a subclass of Shirt
We can infer: prod001 is a Shirt
We can ask "find me all blue Shirts", and find prod001
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 29
Inference – Rules[1] We said "Henley is a subclass of Shirt" We need a formal definition (rule) for subclass
rule "subClassOf rdfs9" construct { ?x a ?c2 } { ?x a ?c1 . ?c1 rdfs:subClassOf ?c2 . filter(?c1!=?c2) }
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 30
Inference – Example[2] We know:
prod001 is a Henley ID_001 is blue prod001 is the same as ID_001
We can infer: prod001 is a blue Henley
We can ask "find me all blue Henleys", and find prod001
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 31
Inference – Rules[2] We said "prod001 is the same as ID_001" We need a formal definition (rule) for same as
rule "sameAs rdfp11b" construct { ?u2 ?p ?v } { ?u1 ?p ?v . ?u1 owl:sameAs ?u2 . filter(?u1!=?u2) }
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 32
Share data + share rules and vocabularies Make use of the Linked Open Data web
Make use of standard Ontologies
Generalized rules: owl, rdf, rdfs Domain-specific rules and vocabularies: foaf, FIBO, …
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 33
NLP – What is it? Natural Language Processing (NLP) - A field of computer science and linguistics enabling computers
to derive meaning from human or natural language input. NLP technology can extract meaning from text or speech.
Text Analytics – Analysis to derive high-quality information from text usually involving the process of structuring the input text, deriving patterns within the structured data and interpretation of the output.
Entity Extraction – A subset of Text Analytics, where you run a tool over some text to identify "entities" in the text. "Entities" may be people, places, companies, organizations, phone numbers, and so on. MarkLogic Partners that do entity extraction include Temis, NetOwl, Smart Logic, and SAP. Some can return entities in the form of RDF.
Event Extraction – an emerging enhancement to Entity Extraction that extracts events ("John went to China") as well as entities ("John is a person, China is a place") from text.
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 34
NLP – Where does it fit?
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 35
Where do Triples come from? Triples are used to express
Facts
Relationships
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 36
DOMAIN WORLD AT LARGE
DOCUMENTS
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 37
Facts from the World at Large
“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
Linked Open Data
Facts that are freely available In a form that’s easily consumed
DBpedia (wikipedia as structured information)
Einstein was born in Germany
Ireland’s currency is the Euro GeoNames
Doha is the capital of Qatar
Doha has these lat/long coordinates
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 38
Facts from your domain Like Open Data, but domain specific
Might be proprietary within a company
Or shared across an industry
Includes data and ontologies
Some Examples
A bank's proprietary reference data
A pharmaceutical company's drug ontology
An industry-wide ontology such as FIBO
Proprietary Semantic Facts (Facts and Taxonomies in your
organization or industry)
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 39
Facts from Documents Document metadata
Ex: Categories, author, publish date, source
Facts in free-flowing text
Entities: this document mentions the person Richard Nixon, the product Advil, the company IBM
Events: this document says that Nixon went to China, John Smith met Jane Doe, Barclays acquired Lehman Brothers
Found automatically or provided at authoring time
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 40
The World of Triples Linked Open Data
(Free semantic facts available to anyone)
Facts from Free-Flowing Text (Derived from semantic enrichment)
Proprietary Semantic Facts (Facts and Taxonomies in your organization)
Facts in Documents (Part of metadata or added with authoring tools)
Sem
anti
c W
orld
Doc
um
ent
Wor
ld
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 41
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 42
Relationships Triples can model many kinds of relationships: Relationships between resources
customer123 is the same as cus_id_456 Relationships between values
"John Smith" is the same as "John Smythe" Relationships between classes
"Henley" is a sub class of "Shirt" Relationships between a predicate and its subject or object
The object of "lives in" is a place Relationships between entities and documents
"Merrill Lynch" was mentioned in reportABC (which mentioned "rogue trader")
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 43
Relationships Relationships make it easy to: Integrate data from disparate sources
customer123 (from source1) is the same as cus_id_456 (from source2) Reconcile data
"John Smith" is the same as "John Smythe" Infer new facts about the data
"Henley" is a sub class of "Shirt" The object of "lives in" is a place
Link entities with documents "Merrill Lynch" was mentioned in reportABC (which mentioned "rogue trader")
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 44
Data Modelling - You can model everything in RDF! Pattern
Temptation: go "all-in" on RDF and SPARQL Pragmatism: back off to use a mix of RDF and XML/JSON
Example: Customer record Always delivered as a whole record Don’t shred it into RDF and reconstitute for every query! Store as XML/JSON, return as a single object
Recommendation Be pragmatic from the start Ask for requirements (what), not implementation(how) Use RDF and XML/JSON as appropriate
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 45
Why Semantic Technologies?
Triples are atomic – easy to create, manage, combine The Linked Open Data Web shares data as triples A natural choice for metadata and real-world facts
.. and facts embedded in a document Adds relationships between facts, between documents Standards encourage tools and sharing Graph model – easy to follow links Ontologies – share information, infer new facts
Because …
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 46
Why Semantics and Search?
Many use cases need documents, triples, and data together One database means a simple, efficient, powerful architecture Combination queries – query documents, triples, data in a single query –
open up new possibilities
Because …
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 47
SEMANTICS AND … Better Together
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 48
SEMANTICS AND .. DOCUMENTS
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 49
Triples and Documents Documents can contain triples <article> <meta> <title>Man bites dog</title> <sem:triple> <sem:subject>http://example.org/news/42</sem:subject> <sem:predicate>http://example.org/published</sem:predicate> <sem:object>2013-09-10</sem:object> </sem:triple>
…
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 50
Triples and Documents
Triples are persisted in documents <sem:triple> <sem:subject>http://example.org/news/Nixon</sem:subject> <sem:predicate>http://example.org/wentTo</sem:predicate> <sem:object>China</sem:object> </sem:triple>
…
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 51
Triples and Documents Documents can contain triples <article> <meta> <title>Man bites dog</title> <sem:triple> <sem:subject>http://example.org/news/42</sem:subject> <sem:predicate>http://example.org/published</sem:predicate> <sem:object>2013-09-10</sem:object> </sem:triple>
…
Triples are persisted in documents <sem:triple> <sem:subject>http://example.org/news/Nixon</sem:subject> <sem:predicate>http://example.org/wentTo</sem:predicate> <sem:object>China</sem:object> </sem:triple>
…
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 52
Triples and Documents Documents can contain triples <article> <meta> <title>Man bites dog</title> <sem:triple> <sem:subject>http://example.org/news/42</sem:subject> <sem:predicate>http://example.org/published</sem:predicate> <sem:object>2013-09-10</sem:object> </sem:triple>
…
Triples can be annotated in documents <source>AP Newswire</source> <sem:triple date="1972-02-21" confidence="100"> <sem:subject>http://example.org/news/Nixon</sem:subject> <sem:predicate>http://example.org/wentTo</sem:predicate> <sem:object>China</sem:object> </sem:triple>
…
import module namespace sem = "http://marklogic.com/semantics" at "/MarkLogic/semantics.xqy"; sem:sparql(' SELECT ?country WHERE { <http://example.org/news/Nixon> <http://example.org/wentTo> ?country } ', (), (), cts:and-query( ( cts:path-range-query( "//sem:triple/@confidence", ">", 80) , cts:path-range-query( "//sem:triple/@date", "<", xs:date("1974-01-01")), cts:or-query( ( cts:element-value-query( xs:QName("source"), "AP Newswire" ), cts:element-value-query( xs:QName("source"), "BBC" ) ) ) ) ) )
Which countries did Nixon visit?
.. before 1974?
.. only show me answers where I have at least 80% confidence
.. and the source is AP Newswire OR BBC
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 54
SEMANTICS AND .. DOCUMENTS .. AND GEOSPATIAL .. AND SCALAR (DATETIME) .. AND BITEMPORAL
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 55
Two Hemispheres, One Brain
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 56
Two Hemispheres, One Brain
Triples: Highly structured Atomic Do one thing well
XML and JSON: Flexible structure Rich documents Rich applications
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 57
Combination query - scenario You work in an Incident Call Center A call comes in:
"some maniac in a blue van just tried to run me down" "I got the first three letters of his license plate: ABC"
You could look up "ABC*" in the license plate database, or … .. Look for similar incident reports
Reports that mention a "blue van" … around the same time … around the same place … with a license plate that starts with "ABC"
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 58
<SAR> <title> Suspicious vehicle… Suspicious vehicle near airport <date> <type> <threat>
2012-11-12Z observation/surveillance
<type> suspicious activity <category> suspicious vehicle
<location> <lat> 37.497075 <long> -122.363319
<subject> IRIID <subject> IRIID
<predicate> <predicate>
isa value
<triple> <triple>
<object> license-plate <object> ABC 123
<description> A blue van… A blue van with license plate ABC 123 was observed parked behind the airport sign…
</title> </date>
</type>
</type> </category>
</threat>
</lat> </long>
</location>
</subject> </subject>
</predicate> </predicate>
</object> </object>
</description> </SAR>
</triple> </triple>
An XML or JSON document can represent many information types:
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 59
Combination Query: Example <SAR>
<title>
Suspicious vehicle…
<date>
2012-11-12Z
<type>
<threat>
suspicious activity <category>
suspicious vehicle
<location>
<lat>
37.497075
<long>
-122.363319
<description>
A blue van…
<subject> <subject>
<predicate>
<object>
IRIID
IRIID
isa
value
license-plate
ABC 123 <predicate>
<object>
observation/surveillance <type>
<triple>
<triple>
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 60
SEMANTICS AND .. ENTERPRISE DATABASE FEATURES
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 61
XQuery XSLT SQL JavaScript SPARQL
GRAPH SPARQL
Semantics Database Architecture
TRIPLE
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
MarkLogic Semantics Use Cases
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 63
SEMANTIC SEARCH
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 64
Semantic Search User searches and queries refined by topics and semantic relationships
Refine search with topics and concepts
Geo-location of research institutions, Semantic Visualization & Tag Clouds
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 65
Suggested / Related Content Topic, semantic relationships and content used to find related and suggested content
Related articles
Suggestions
Augmented topic browsing
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 66
Linked Open Data Semantic data augmenting user search and queries
Return concepts and facts in addition to results
Leverage context from all sources
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 67
Dynamic Semantic Publishing Present content, data and information to users
Relationships power content presentation
Taxonomy browsing
Beyond WebCMS to Dynamic Publishing
Efficiently tag articles
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 68
Master (Meta)data Management Flexible model to manage metadata
Metadata master
Digital Supply Chain powered with semantic relationships
Captures the complexity of information needed to deliver digital assets and products
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 69
DATA INTEGRATION
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 70
Use Case: Investment Research Environment:
SEC Filings Analyst Briefing Transcripts News Feeds Press Releases
70 2003-2014 Brook Path Partners, Inc. All Rights Reserved. www.brookpath.com
Challenge:
Provide a simple search solution for investment analysts to quickly identify opportunities
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 71
Investment Research Using Semantic Technology
71 2003-2014 Brook Path Partners, Inc. All Rights Reserved. www.brookpath.com
SEC Filings
News Feeds
Analyst Briefings
Press Releases
Research Ontology
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 72
Use Case: Reference Data Environment:
Hundreds of Business Units Hundreds of Products Thousands of Applications Multiple Data Formats
Structured Unstructured
Multiple Identifiers
72 2003-2014 Brook Path Partners, Inc. All Rights Reserved. www.brookpath.com
Challenge:
Aggregate all data for across business units and geographies.
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 73
Reference Data Using Semantic Technology
73 2003-2014 Brook Path Partners, Inc. All Rights Reserved. www.brookpath.com
UltimateParent
JointVenture
WhollyOwnedSubsidiary
MajorityOwnedSubsidiary
SignificantlyOwnedSubsidiary
Customer
Customer APAC Subsidiary
Customer Japanese Subsidiary
ultimateParentOf, whollyOwnsAndControls
majorityOwnsAndControls
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 74
Reference Data Using Semantic Technology
74 2003-2014 Brook Path Partners, Inc. All Rights Reserved. www.brookpath.com
Customer
Customer APAC Subsidiary
Customer Japanese Subsidiary
ultimateParentOf, whollyOwnsAndControls
majorityOwnsAndControls
Advantages of using Semantics: • Clearly define relationships between entities • Query entities and relationships together • Use graph traversal to find and discover
facts/relationships • Queries can infer data using standard rules • Run queries / serve queries from standard SPARQL
endpoints • Aggregate and report using SPARQL
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 75
Use Case: Customer Insight
Challenge: Progress from transaction flow analysis to person-centric analytics, combining data from many diverse sources
Environment:
Dozens of transactional
systems, each with their own analytics
Interaction records External data sources Connections among
customers and other entities
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 76
Marketing
Profile Configuration Tools
Profile Data Extracted From multiple sources
Profiles include social graphs
Fraud and Financial Crime
Customer Insight Using Semantic Technology
Customer-centric view
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 77
Use Case: Regulatory Compliance
Environment:
Thousands of rules, millions
of accounts and onboarding documents
Impossible to pre-define dimensions, relationships
Challenge:
Provide a scalable map of regulations to internal policies and drive automated workflow.
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 78
Regulatory Compliance Using Semantic Technology
Documents
MarkLogic Workflow
Policies Ontology
Regulations
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 79
Use Case: Data Provenance
79 2003-2014 Brook Path Partners, Inc. All Rights Reserved. www.brookpath.com
Challenge: Provide a consistent way to identify the source, timeliness and accuracy of the data
Environment:
Regulations requiring data
lineage Complex data lifecycle, which
makes it hard to keep track of data elements and their changes
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 80
Data Provenance Using Semantic Technology
<Trade> <Cashflows>
<subject> <subject> TradeID
<predicate> <predicate>
wasDerivedFrom wasAttributedTo
<triple> <triple>
<object> CDS_xyz <object> System_123
<provenance> </subject>
</subject> </predicate>
</predicate> </object> </object>
</provenance> </Trade>
</triple> </triple>
Cashflows
<PartyIdentifier> <TradeID> 123456 </TradeID>
</PartyIdentifier> </Cashflows>
© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 81
Thank you! www.marklogic.com