ldm slides: data modeling for xml and json

34
Data Modeling for XML & JSON Donna Burbank Global Data Strategy Ltd. Lessons in Data Modeling DATAVERSITY Series Dec 6 th , 2016

Upload: dataversity

Post on 06-Jan-2017

466 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: LDM Slides: Data Modeling for XML and JSON

Data Modeling for XML & JSONDonna Burbank

Global Data Strategy Ltd.

Lessons in Data Modeling DATAVERSITY Series

Dec 6th, 2016

Page 2: LDM Slides: Data Modeling for XML and JSON

Global Data Strategy, Ltd. 2016

Donna is a recognized industry expert in information management with over 20 years of experience in data strategy, information management, data modeling, metadata management, and enterprise architecture.

She is currently the Managing Director at Global Data Strategy, Ltd., an international information management consulting company that specialises in the alignment of business drivers with data-centric technology. In past roles, she has served in a number of roles related to data modeling & metadata:

• Metadata consultant (US, Europe, Asia, Africa)

• Product Manager PLATINUM Metadata Repository

• Director of Product Management, ER/Studio

• VP of Product Marketing, Erwin

• Data modeling & data strategy implementation & consulting

• Author of 2 books of data modeling & contributor to 1 book on metadata management, plus numerous articles

• OMG committee member of the Information Management Metamodel (IMM)

As an active contributor to the data management community, she is a long time DAMA International member and is the President of the DAMA Rocky Mountain chapter. She has worked with dozens of Fortune 500 companies worldwide in the Americas, Europe, Asia,

and Africa and speaks regularly at industry conferences. She has co-authored two books: Data Modeling for the Business and Data Modeling Made Simple with ERwin Data Modeler and is a regular contributor to industry publications such as DATAVERSITY, EM360, & TDAN. She can be reached [email protected] is based in Boulder, Colorado, USA.

Donna Burbank

2

Follow on Twitter @donnaburbankToday’s hashtag: #LessonsDM

Page 3: LDM Slides: Data Modeling for XML and JSON

Global Data Strategy, Ltd. 2016

Lessons in Data Modeling Series

• July 28th Why a Data Model is an Important Part of your Data Strategy

• August 25th Data Modeling for Big Data

• September 22nd UML for Data Modeling – When Does it Make Sense?

• October 27th Data Modeling & Metadata Management

• December 6th Data Modeling for XML and JSON

3

This Year’s Line Up

Page 4: LDM Slides: Data Modeling for XML and JSON

Global Data Strategy, Ltd. 2016

Agenda

• Overview of XML and JSON

• Data Modeling & Metadata for XML & JSON

• Integrating XML & JSON with Databases (Relational & NoSQL)

• RDF & the Semantic Web

• Summary & Questions

4

What we’ll cover today

Page 5: LDM Slides: Data Modeling for XML and JSON

Global Data Strategy, Ltd. 2016

Assumption• An assumption for today is that the majority of attendees are familiar with relational databases &

Entity-Relationship (E/R) modeling.• E.g. Data Modelers, Data Architects, SQL Developers, BI Developers, etc.

• The examples are given with that bias, i.e. a comparison with the relational database world.

5

From Data Modeling for the Business by Hoberman, Burbank, Bradley, Technics Publications, 2009

Page 6: LDM Slides: Data Modeling for XML and JSON

Global Data Strategy, Ltd. 2016

What is XML?

• What is XML? – (Extensible Markup Language) is used to store and transport data.

• Some design principles of XML:• Simplicity: ease of usage, interoperability & understanding

• Modular design: do one thing well

• Extensible: Ability to easily modify the structure & content

• Self-descriptive: ease of understanding

• Machine readable

• Human readable

• Embedded descriptive tags

• XML is designed for data availability, sharing & transport.

• It requires complementary technology to do anything else. i.e. Someone must write a piece of software to send, receive, store, or display it, for example:• HTML: Format & presentation of the data

• Web Service: Transport of the data (e.g. SOAP)

• Database: Store & integrate with other data sources

6

Page 7: LDM Slides: Data Modeling for XML and JSON

Global Data Strategy, Ltd. 2016

XML and JSON Assist with Data Exchange

7

• XML and JSON can be used to assist with data exchange (B2B, B2C, etc.)• Companies

• Government Agencies

• Research Organizations

• Etc.

Purchase Order

Page 8: LDM Slides: Data Modeling for XML and JSON

Global Data Strategy, Ltd. 2016

Emergence & the Growth of Data Exchange

In philosophy, systems theory, science, and art, emergence is the way complex systems and patterns arise out of a

multiplicity of relatively simple interactions.

- Wikipedia

Page 9: LDM Slides: Data Modeling for XML and JSON

Global Data Strategy, Ltd. 2016

XML uses a Hierarchical Structure

• XML uses a hierarchical, nested tree structure• An XML tree starts at a root element and branches from the root to child elements.

• All elements can have sub elements (child elements)

9

<?xml version="1.0"?><shipto><name>John Smith</name><address>123 Main ST</address><city>Boise</city><country>USA</country>

</shipto>

Root element

Childelements

Page 10: LDM Slides: Data Modeling for XML and JSON

Global Data Strategy, Ltd. 2016

XML is Extensible

• XML is extensible, in that element can be easily added as needed.• If the <state> element is added below, older applications using the original version will still work.

10

<?xml version="1.0"?><shipto><name>John Smith</name><address>123 Main ST</address><city>Boise</city><country>USA</country>

</shipto>

<?xml version="1.0"?><shipto><name>John Smith</name><address>123 Main ST</address><city>Boise</city><state>ID</state><country>USA</country>

</shipto>

Page 11: LDM Slides: Data Modeling for XML and JSON

Global Data Strategy, Ltd. 2016

XML is Self-Describing

• XML is self-describing (sort of) with the use of element tags• Human-readable format

• Tags describe the content of the element (sort of)

11

<?xml version="1.0"?><shipto>

<name>John Smith</name><address>123 Main ST</address><city>Boise</city><country>USA</country>

</shipto>

From reading the tags, it’s pretty clear that we’re

talking about a “Ship To” address that contains the

name, address, city & country.

But it doesn’t provide full metadata, e.g.:• What’s the data type?• What’s the business definition?• Is <name> a required field?

Page 12: LDM Slides: Data Modeling for XML and JSON

Global Data Strategy, Ltd. 2016

XML Metadata – the XML Schema

• Similar to DDL, an XML Schema (XSD) defines the structure & format of data

12

<?xml version="1.0" encoding="UTF-8" ?><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"><xs:element name="shiporder"><xs:complexType>

<xs:sequence><xs:element name="orderperson" type="xs:string"/><xs:element name="shipto"><xs:complexType>

<xs:sequence><xs:element name="name" type="xs:string"/><xs:element name="address" type="xs:string"/><xs:element name="city" type="xs:string"/><xs:element name="country" type="xs:string"/>

</xs:sequence></xs:complexType>

</xs:element></xs:sequence>

<xs:attribute name="orderid" type="xs:string" use="required"/></xs:complexType>

</xs:element></xs:schema> XSD

Metadata

Ship to:John Smith123 Main STBoiseUSA

………………………………………………………………………………

Order Shipment

Data

<?xml version="1.0"?><shipto><name>John Smith</name><address>123 Main ST</address><city>Boise</city><country>USA</country>

</shipto>

XML

Data

Page 13: LDM Slides: Data Modeling for XML and JSON

Global Data Strategy, Ltd. 2016

Graphical Models of XML Schemas

13

• XML Schemas can be shown graphically as well as via text.

* Source: Altova

Page 14: LDM Slides: Data Modeling for XML and JSON

Global Data Strategy, Ltd. 2016

XML Metadata – the XML Schema

• Although the XML Schema does provide some physical structural metadata, full metadata descriptions are incomplete, e.g.

• Is the name field required?

• What’s the business definition for each field?

• Are there code values and/or reference data that can be used?

• Can a complex data type be used?

• Etc.

14

Page 15: LDM Slides: Data Modeling for XML and JSON

Global Data Strategy, Ltd. 2016

Levels of Data Modeling

15

Conceptual

Logical

Physical

Purpose

Communication & Definition of Business Terms & Rules

Clarification & Detail of Business Rules &

Data Structures

Technical Implementationwith a Physical Database

or Structure

Audience

Business Stakeholders

Data ArchitectureBusiness Analysts

DBAsDevelopers

Business Concepts

Data Entities

Physical Tables

XML Schema defines some physical metadata

But limited or no business metadata

Page 16: LDM Slides: Data Modeling for XML and JSON

Global Data Strategy, Ltd. 2016

Metadata & Context

From Data Modeling for the Business by Hoberman, Burbank, Bradley, Technics Publications, 2009

Is this Customer a:• Premier Customer• Lapsed Customer• High Risk Customer?

Can a Customer have more than one Account?

Is the Ship To Address related to the Customer or the Account?

What are the valid state codes for the Ship To Address?

Page 17: LDM Slides: Data Modeling for XML and JSON

Global Data Strategy, Ltd. 2016

XML Assists with Data Exchange

17

• XML and JSON can be used to assist with data exchange (B2B, B2C, etc.)

• Remember modularity, simplicity, etc.

Purchase Order

Dude-all that other stuff isn’t my job. I’m just

sending the PO!

Page 18: LDM Slides: Data Modeling for XML and JSON

Global Data Strategy, Ltd. 2016

Integrating XML with Relational Databases

• XML is often used in conjunction with relational databases for permanent storage and integration with other operational, reporting, and reference data.

18

Purchase OrderOracle SQL Server

Page 19: LDM Slides: Data Modeling for XML and JSON

Global Data Strategy, Ltd. 2016

Integrating XML with Relational Databases

• XML can be translated into relational databases, and vice-versa

19

XML Schema DDL

* Source: Altova

Page 20: LDM Slides: Data Modeling for XML and JSON

Global Data Strategy, Ltd. 2016

Integrating XML with Relational Databases

20

• XML can be translated into relational databases, and vice-versa

XML Model Diagram Relational Model Diagram

* Source: Altova

Page 21: LDM Slides: Data Modeling for XML and JSON

Global Data Strategy, Ltd. 2016

What is JSON?

• What is JSON? – (JavaScript Object Notation) is a minimal, readable format for structuring data. It is used primarily to transmit data between a server and web application, as an alternative to XML.

• It is similar to XML in that it is:• "self describing" & human readable

• hierarchical

• simple & interoperable

21

• It differs from XML in that it is:• can be parsed with standard JavaScript notation

• uses arrays

• can be simpler & shorter to read & write.

{"employees":[{"firstName":“Shannon", "lastName":“Kempe"},{"firstName":"Anita", "lastName":“Kress"},{"firstName":“Tony", "lastName":“Shaw"}

]}

<employees><employee>

<firstName>Shannon</firstName><lastName>Kempe</lastName>

</employee><employee>

<firstName>Anita</firstName><lastName>Kress</lastName>

</employee><employee>

<firstName>Tony</firstName><lastName>Shaw</lastName>

</employee></employees>

JSON XML

Page 22: LDM Slides: Data Modeling for XML and JSON

Global Data Strategy, Ltd. 2016

JSON Metadata – The JSON Schema

22

• The JSON schema offers a richer set of metadata.

{"id": 127849,“brand": “Super Cooler","price": 12.50,"tags": [“camping", “sports"]}

Example Product in the API

Data

• Can the ID contain letters?• What is a brand?• Is a price required?• Etc.

Context Needed (i.e. Metadata)

For example, assume we have a JSON based product catalog. This catalog has a product which has an id, a brand,

a price, and an optional set of tags.

{ "$schema": "http://json-schema.org/draft-04/schema#","title": "Product","description": "A retail product from Acme's online catalog","type": "object","properties": {

"id": {"description": "The unique identifier for a product","type": "integer"

},“brand": {

"description": “The brand name of the product as shown in the online catalogue","type": "string"

},"price": {

"type": "number",},"tags": {

"type": "array","items": {

"type": "string"},"minItems": 1,

}},"required": ["id", “brand", "price"]

}

JSON Schema

Metadata

Page 23: LDM Slides: Data Modeling for XML and JSON

Global Data Strategy, Ltd. 2016

Integrating JSON with Document Databases

• JSON is often used with document databases, such as MongoDB, which uses JSON documents in order to store records

• Document databases are popular ways to store unstructured information in a flexible way (e.g. multimedia, social media posts, etc. )

23

• Each Collection can contain numerous Documents which could all contain different fields.

{type: “Artifact”,medium: “Ceramic”country: “China”,}

{type: “Book”,title: “Ancient China”country: “China”,}

Page 24: LDM Slides: Data Modeling for XML and JSON

Global Data Strategy, Ltd. 2016

The Semantic Web & RDF• The RDF (Resource Description Framework) model from the World Wide Web Consortium (W3C) provides a

way to link resources on the web (people, places, things). It provides a common framework for applications to share information without losing meaning.

• Search Engines

• Exchanging data between datasets

• Sharing information with applications / APIs

• Building social networks

• Etc.

• The goal is to move from a web of documents to a web of data.

• The Framework is a simple way to express relationships between resources.

• IRIs (International Resource Identifiers) (e.g. URI) identify resources

• Simple triples relate objects together in the format: <subject> <predicate> <object>

• These relationships create a connected Graph

• There are several serialization formats, with RDF XML being a common one. For example:

• Turtle is a human-friendly format

• RDF/XML

• JSON-LD

• Schemas define the vocabularies used to describe the objects

• Dublin Core and Schema.org are two common ones

24

Subject ObjectPredicate

ACME Publishing

RDF is Easy

Is Publisher Of

Page 25: LDM Slides: Data Modeling for XML and JSON

Global Data Strategy, Ltd. 2016

Creating a Web of Data

25

@type: Place

Sheraton San Diego Hotel & Marina1380 Harbor Island DriveSan Diego, California 92101 USA

"@context": "http://schema.org",“location": {

"@type": "Place","name": "Sheraton San Diego Hotel & Marina","address": {

"@type": "PostalAddress","streetAddress": "1380 Harbor Island Drive","addressLocality": "San Diego","addressRegion": "CA","postalCode": "92101"

},"telephone" : "+1-877-734-2726","image":

"http://edw2016.dataversity.net/uploads/ConfSiteAssets/72/image/sheraton.jpg",

"url":"http://edw2016.dataversity.net/travel.cfm"},

"@context": "http://schema.org","location": {

"@type": "Place","name": "Sheraton San Diego Hotel & Marina","address": {

"@type": "PostalAddress","streetAddress": "1380 Harbor Island Drive","addressLocality": "San Diego","addressRegion": "CA","postalCode": "92101"

},"telephone" : "+1-877-734-2726","image": “http://mysite.com/edw16photo.jpg","url":“http://mysite.com/myphotos"

},

* Script provided by: Eric Franzon, [email protected]

*

Page 26: LDM Slides: Data Modeling for XML and JSON

Global Data Strategy, Ltd. 2016

Dublin Core Metadata Initiative

• The Dublin Core Metadata Initiative provides a common metadata standards for resources such as media, library books, etc.

• It defines standards for information such as:

26

http://dublincore.org

TitleCreatorSubjectDescriptionPublisherContributorDateType

FormatIdentifierSourceLanguageRelationCoverageRights

Resources can be described using:

Text

HTML

XML

RDF XML

Sample Metadata

Format="video/mpeg; 5 minutes“Language="en"Publisher=“Kats Online, LLC"Title=“My Favorite Cat Video“Subject=“Cats“Description=“A short video of a black cat playing with string."

Page 27: LDM Slides: Data Modeling for XML and JSON

Global Data Strategy, Ltd. 2016

Schema.org

• Schema.org is a vocabulary that webmasters can use to mark-up Web pages for the Semantic Web, so that search engines understand what the pages are about .• Created by a group of search providers (e.g. Google, Microsoft, Yahoo and Yandex).

• Vocabularies are developed by an open community process• Through GitHub (https://github.com/schemaorg/schemaorg)

• Using the [email protected] mailing list

• The schemas are a set of 'types', each associated with a set of properties. The types are arranged in a hierarchy. There are currently over 570 types, including:• Creative works• Organization• Person• Place, LocalBusiness, Restaurant • Product, Offer, AggregateOffer

• Etc.• There are also extensions for particular industries such as:

• auto.schema.org• health-lifesci.schema.org

27

Resources can be described using:

JSON-LD

RDFa

Etc.

Page 28: LDM Slides: Data Modeling for XML and JSON

Global Data Strategy, Ltd. 2016

There are Many Other Common Schemas & Vocabularies

• The Dublin Core and Schema.org are two popular schemas, but many more exist for particular subject areas, industries, etc.

• The Linked Open Vocabularies site (LOV) provides a helpful listing

28http://lov.okfn.org/dataset/lov/

Dublin Core

Schema.org

Friend of a Friend

Page 29: LDM Slides: Data Modeling for XML and JSON

Global Data Strategy, Ltd. 2016

Summary

• XML and JSON are used for transport and interoperability of data

• They offer a variety of benefits• Simplicity: ease of usage, interoperability & understanding

• Modular design: do one thing well

• Extensible: Ability to easily modify the structure & content

• Self-descriptive: ease of understanding

• Integration with Databases allows for broader enterprise sharing & storage• Translation to Relational databases

• Storage for Document databases

• Graphical Models can be used across technologies for an intuitive way to visualize hierarchies & relationships

• The Semantic Web is a powerful way to support the internet as a “web of data”

Page 30: LDM Slides: Data Modeling for XML and JSON

Global Data Strategy, Ltd. 2016

About Global Data Strategy, Ltd

• Global Data Strategy is an international information management consulting company that specializes in the alignment of business drivers with data-centric technology.

• Our passion is data, and helping organizations enrich their business opportunities through data and information.

• Our core values center around providing solutions that are:• Business-Driven: We put the needs of your business first, before we look at any technology solution.• Clear & Relevant: We provide clear explanations using real-world examples.• Customized & Right-Sized: Our implementations are based on the unique needs of your organization’s

size, corporate culture, and geography. • High Quality & Technically Precise: We pride ourselves in excellence of execution, with years of

technical expertise in the industry.

30

Data-Driven Business Transformation

Business StrategyAligned With

Data Strategy

Visit www.globaldatastrategy.com for more information

Page 31: LDM Slides: Data Modeling for XML and JSON

Global Data Strategy, Ltd. 2016

Contact Info

• Email: [email protected]

• Twitter: @donnaburbank

@GlobalDataStrat

• Website: www.globaldatastrategy.com

• Company Linkedin: https://www.linkedin.com/company/global-data-strategy-ltd

• Personal Linkedin: https://www.linkedin.com/in/donnaburbank

31

Page 32: LDM Slides: Data Modeling for XML and JSON

Global Data Strategy, Ltd. 2016

DATAVERSITY Training Center

• Learn the basics of Metadata Management and practical tips on how to apply metadata management in the real world. This online course hosted by DATAVERSITY provides a series of six courses including:• What is Metadata

• The Business Value of Metadata

• Sources of Metadata

• Metamodels and Metadata Standards

• Metadata Architecture, Integration, and Storage

• Metadata Strategy and Implementation

• Purchase all six courses for $399 or individually at $79 each.Register here

• Other courses available on Data Governance & Data Quality

32

Online Training Courses

New Metadata Management Course

Visit: http://training.dataversity.net/lms/

Page 33: LDM Slides: Data Modeling for XML and JSON

Global Data Strategy, Ltd. 2016

Lessons in Data Modeling Series - 2017

• January 26th How Data Modeling Fits into an Overall Enterprise Architecture

• February 23rd Data Modeling & Business Intelligence

• March 23rd Conceptual Data Models - How to Get the Attention of Business Users (for a Technical Audience)

• April 27th The Evolving Role of the Data Architect – What Does it Mean for Your Career?

• May 25th Data Modeling & Metadata Management

• June 22nd Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –how do they fit together?

• July 27th Data Modeling & Metadata for Graph Databases

• August 24th Data Modeling & Data Integration

• September 28th Data Modeling & MDM

• October 26th Agile & Data Modeling – How can they work together?

• December 5th Data Modeling, Data Governance, & Data Quality

33

Next Year’s Line Up

Page 34: LDM Slides: Data Modeling for XML and JSON

Global Data Strategy, Ltd. 2016

Questions?

34

Thoughts? Ideas?