how semantics solves big data challenges

28
© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. How Semantics Solves Big Data Challenges Matt Allen MarkLogic

Upload: dataversity

Post on 14-Aug-2015

529 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: How Semantics Solves Big Data Challenges

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

How Semantics Solves Big Data Challenges

Matt AllenMarkLogic

Page 2: How Semantics Solves Big Data Challenges

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 2

Without context, organizing information is really hardWhy do we need semantics?

Page 3: How Semantics Solves Big Data Challenges

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 3

Disconnected Data, Unable to Handle Complexity

#1 impediment to big data success is having too many silos

Page 4: How Semantics Solves Big Data Challenges

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 4

Example: Categorizing media assets

Disconnected Data, Unable to Handle Complexity

Image ABC

File Name

Format

Create Date

Rights

Caption

Dog Image

Story

Title

Run Date

Credit

Position

Image 123

Costs

Rights

Usage

Revenue

Photographer

Photographer Accountant Editor

Page 5: How Semantics Solves Big Data Challenges

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 5

Disconnected Data, Unable to Handle Complexity

Example: Searching people, places, and things with context

vs vsvs

sub hoagie

vs

Page 6: How Semantics Solves Big Data Challenges

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 6

Disconnected Data, Unable to Handle Complexity

Example: Product research and development pipeline

Pre-LaunchAdvanced Product Development

Early Product Development

Proof of ConceptInitial Identification

Phase 1Discovery Phase 2 Phase 3 Phase 4

Can I know more about this particular area of research?

Can I find out more about whether this new product is viable?

What locations with product X, showed Y characteristic, during May-June in year 2007, 2008?

What global testing was done around product X were undertaken across the world in 2012?

Does this product already exists in the pipeline?

The problem… different words describe the same things, product names change over time, domain knowledge is not captured and made searchable, and there are too many data silos to search in a limited time

Page 7: How Semantics Solves Big Data Challenges

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 7

Disconnected Data, Unable to Handle Complexity

Example: Managing overlapping domains of knowledge in healthcare

Is “Psychoses” a “mental disorder” or “psychotic illness”?

Page 8: How Semantics Solves Big Data Challenges

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 8

We’ve created elaborate systems to categorize information

Page 9: How Semantics Solves Big Data Challenges

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 9

But it ends up looking more like this

Page 10: How Semantics Solves Big Data Challenges

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 10

Problems With the Relational ApproachInflexible Data Model

Everything modeled up front

Schema complexity

Difficult to make changes later

Fixed to a specific business purpose

Lots of expensive ETL

Inability to store unstructured data

Mismatch for modern app development

Inability to Model Relationships

No standard for modeling people, places, things

Lack of context within taxonomies/ontologies

Inability to Query Heterogeneous Data

Inability to handle complex queries across varied data

Limited Scalability

Scale up, not out

Page 11: How Semantics Solves Big Data Challenges

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 11

360 ViewHealthcare

How do we achieve this?

Page 12: How Semantics Solves Big Data Challenges

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 12

Enter Semantics…

John livesIn IsIn EnglandLondon

TriplesSubject :Predicate :Object

Semantics is a simple and elegant way to model data as facts and relationships. Semantics uses a data format called RDF that you query with SPARQL.

Page 13: How Semantics Solves Big Data Challenges

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 13

Triples Come in Different Formats

John livesIn London

<sem:triple><sem:subject> http://xmlns.com/foaf/0.1/name/"John"</sem:subject><sem:predicate> http://example.org/livesIn</sem:predicate><sem:object datatype="http://www.w3.org/2001/XMLSchema#string">"London"</sem:object>

</sem:triple>

{"triple" : {

"subject": "http://xmlns.com/foaf/0.1/name" "John","predicate": "http://example.org/livesIn","object": { "value": "London", "datatype": "xs:string" }

}

<http://dbpedia.org/resource/John><http://dbpedia.org/ontology/LivesIn><http://dbpedia.org/resource/London> .

Turtle

JSON

XML

3 IRI’s

2 IRI’s, 1 string

Page 14: How Semantics Solves Big Data Challenges

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 14

Relationships and Context Are Obvious with Triples

Tweeted TweetXYZ Sentiment Positive(=High Value)

This customer is saying good things about us. They’ve just walked into our store. Should we reward them?

Customer123

Page 15: How Semantics Solves Big Data Challenges

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 15

Documents + Triples Provide a Better Model

Title

HD MasterDates

Production Date

Editing Date

Release Date

International Date

Asset

is

<work>

<collection>

<category>

is part of

<character>

<place>

<performer>

appears in

is a

played

lives in

Title

Character

Film Series

Animated

Actress

City

Semantic TriplesDocument

Page 16: How Semantics Solves Big Data Challenges

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 16

DocumentHospital Name: Johns Hopkins

Operation Type: Cataract removal

Operation ID: 13

Surgeon Name: Robert Allen

Drug Name

Drug Manufacturer

Dose Size

Dose UOM

Minicillan Drugs R Us 200 mg

Maxicillan Canada4Less 400 mg

Minicillan Drugs USA 150 mg

Graph Relational

+ >Operation

Person

Hospital

excels at

operated on

works at

Surgeon performed

operated on

patient at

Operation

Operation IDHospitalSurgeonProcedure

Hospital

Hospital IDHospital Name

Surgeon

Surgeon IDSurgeon Name

Procedure

Procedure IDCPT Code

More Capable Than Relational

300% growth in popularity of graph databases

Document databasesare the most popular type of NoSQL database

of enterprise data

of database spend

20%95%

Page 17: How Semantics Solves Big Data Challenges

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 17

Data Documents Triples

RDF

Enterprise FeaturesHA/DR, SECURITY, ACID TRANSACTIONS, SCALABILITY & ELASTICITY

JSON, XML

Flexible Data Model

Search & QueryBUILT-IN SEARCH & QUERY, POWERFUL INDEXING CAPABILITY

Page 18: How Semantics Solves Big Data Challenges

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 18

NoSQL

KEY-VALUE

COLUMN

DOCUMENT

GRAPH

A.I.

COGNITIVE COMPUTING

PROPERTY GRAPHS

TRIPLE STORES

PREDICTIVE ANALYTICS

NATURAL LANGUAGE

PROCESSING

Seeking Clarity in the World of Data

DATA MINING

MACHINE LEARNING

ENTITY EXTRACTION

KNOWLEDGE GRAPHS

Page 19: How Semantics Solves Big Data Challenges

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 19

From the Classroom to the Boardroom

Page 20: How Semantics Solves Big Data Challenges

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 20

Benefits of MarkLogic Semantics Model facts about people, places, and things

Model complex relationships

Share your data using a common standard

Discover “hidden” facts in your data

Visualize your data as a graph

Use triples as metadata

Work with open linked data

Reconcile and integrate disparate data

Provide context for a specific domain of knowledge

Automate publishing of facts

Work with other semantic technologies

– Extract meaning from unstructured data

– Classify large amounts of data

Remember: Facts, Relationships, Metadata

Page 21: How Semantics Solves Big Data Challenges

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 21

Leading Organizations Using Semantics

Intelligent Search

Complex Data Integration

Dynamic Semantic Publishing

Object-based Intelligence

Compliance

EntertainmentCompany

AgricultureCompany

Page 22: How Semantics Solves Big Data Challenges
Page 23: How Semantics Solves Big Data Challenges

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 23

The World of Dooneese Maharelle

TalentKristen Wiig

Acted in

Episode 4Anne Hathaway and Killers

Part ofPlayed

CharacterMaharelle Sister

Season 34

SegmentThe Lawrence Welk Show

Aired on

Date10/4/08

Era

Acted in

Includes

Part of Has

CharacteristicTiny hands

Page 24: How Semantics Solves Big Data Challenges

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 24

What if you only know a characteristic?

Page 25: How Semantics Solves Big Data Challenges

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 25

The World of Barack Obama Real vs. Impersonation

– Barack Obama cameo vs. Barack Obama impersonation

Different Impersonations

– Fred Armisen as Barack Obama

– Jay Pharoah as Barack Obama

Characters

– The Rock Obama

Page 26: How Semantics Solves Big Data Challenges

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 26

When Data Takes Center Stage…

Page 27: How Semantics Solves Big Data Challenges

More Information…http://info.marklogic.com/semantics-summer

Page 28: How Semantics Solves Big Data Challenges

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Thank you!

Matt Allen <[email protected]>