structured dynamics' semantic technologies product stack

53
May (updated) 2010 Product Stack

Upload: mike-bergman

Post on 28-Nov-2014

6.922 views

Category:

Technology


0 download

DESCRIPTION

Structured Dynamics provides 'ontology-driven applications'. Our product stack is geared to enable the semantic enterprise. The products are premised on preserving and leveraging existing information assets in an incremental, low-risk way. SD's products span from converters to authoring environments to Web services middleware and to eventual ontologies and user interfaces and applications.

TRANSCRIPT

Page 1: Structured Dynamics' Semantic Technologies Product Stack

May (updated) 2010

Product Stack

Page 2: Structured Dynamics' Semantic Technologies Product Stack

Enterprise Approach

Page 3: Structured Dynamics' Semantic Technologies Product Stack

3

Enterprise Approach

Semantic Enterprise based on semantic Web, linked data

Leverage existing assets Data, records and instances Taxonomies, structure and schema

Layer semantics on to existing systems

Develop incrementally

Add sophistication, scope over time

Keep risks low

Integrate with public and Web data (“open world”)

Page 4: Structured Dynamics' Semantic Technologies Product Stack

4

Linked Data

“Linked Data is a set of best practices for publishing

and deploying instance and class data using the RDF

data model, naming the data objects using uniform

resource identifiers (URIs), thereby exposing the data

for access via the HTTP protocol, while emphasizing

data interconnections, interrelationships and context

useful to both humans and machine agents.”

Page 5: Structured Dynamics' Semantic Technologies Product Stack

5

Layers and Current Products

Page 6: Structured Dynamics' Semantic Technologies Product Stack

6

Current Products

the pivotal product; Web services middleware that provides distributed data access and federation

Drupal-based structured data linkage to structWSF

spreadsheet, JSON and XML authoring and conversion framework

reference set of linking subjects and basis for domain vocabularies

an ontology- and entity-driven information extraction and tagging system

Page 7: Structured Dynamics' Semantic Technologies Product Stack

7

Fit of Current Products within Layers

Page 8: Structured Dynamics' Semantic Technologies Product Stack

8

Existing Assets Layer

Page 9: Structured Dynamics' Semantic Technologies Product Stack

9

Existing Assets

These are the materials that need to be federated, made interoperable, and given a common semantics

» structured data / databases» semi-structure data (XML, Web pages)

» unstructured data (text)

Page 10: Structured Dynamics' Semantic Technologies Product Stack

10

Preserving Existing Assets

Relational databases (RDBMs)

Distributed structured assets spreadsheets lightweight datastores

Web pages and Web sites

Existing documents and text

Web databases and APIs

Other databases (RDF, OO, etc.)

Page 11: Structured Dynamics' Semantic Technologies Product Stack

11

Access/Conversion Layer

Page 12: Structured Dynamics' Semantic Technologies Product Stack

12

Conversion

Provides in-place access to existing information

Translates existing formats and structures to RDF

Extracts structured information from unstructured text

Aids creation of interoperable datasets

Geared almost entirely to records, instances or entities (that is, basic data)

Page 13: Structured Dynamics' Semantic Technologies Product Stack

13

Conversion Methods

Relational DBs: RDB2RDF

RDFizers

Information Extraction

New Dataset Authoring

Direct Use (already in RDF)

Page 14: Structured Dynamics' Semantic Technologies Product Stack

14

Relational DB Conversion

Simple mappings of instance records to RDF

Methodologies well proven if kept to the instance level

RDB schema inform the interoperable layer (“ontologies”)

Relational datastores left in place

Record data obtained via access layer (structWSF)

Page 15: Structured Dynamics' Semantic Technologies Product Stack

15

RDFizers

General serialization or data format conversions to RDF

Mostly applied to: Standard data formats and data structs Web content APIs Some legacy content

Sometimes some minor ontology or schema mapping

Embodies all conversion steps to linked data

We have access to more than 100+ existing formats

Page 16: Structured Dynamics' Semantic Technologies Product Stack

16

RDFizers – Listing 1

URN handlers (in addition to IRI and URI):

DOI LSID OAI

RDF Serialization formats:

irON N3 RDF/XML Turtle

Languages and ontologies: AB Meta Annotea APML AtomOWL Bibliographic Ontology Creative Commons EXIF FOAF GeoNames GoodRelations Java Javadoc MARC/MODS Meta Standards Music Ontology Natural Language Open Archives Initiative Protocol for

Metadata Harvesting (OAI-PMH) Open Geospatial OWL SIOC SIOCT

SKOS UMBEL vCard XML Others

(X)HTML pages Embedded Microformats and GRDDL * (see

note below): DC eRDF geoURL Google Base hAudio hCalendar hCard hListing hResume hReview HR-XML Ning RDFa relLicense SVG XBRL XFN xFolk XR-XML XSLT

Syndication Formats: Atom OPML OCS RSS 1.1 RSS 2.0 XBEL (for bookmarks)

REST-style Web service APIs: Alchemy Amazon Apple Best Buy Calais CNet CrunchBase Del.icio.us Digg Discogs Disqus eBay Facebook Flickr Freebase (MQL) FriendFeed Garmin Get Satisfaction Google Google Apps Hoover's HTTP (raw) ISBN DB Last.fm Library Thing Magnolia Meetup MusicBrainz New York Times New York Times Campaign Finance

(NYTCF) New York Times tags

Page 17: Structured Dynamics' Semantic Technologies Product Stack

17

RDFizers – Listing 2

Open Library Open Social Open Street OpenLink (facets) O'Reilly Picasa Radio Pop (BBC) Rhapsody Salesforce Slideshare Slidy Technorati Tesco They Work For You Twine Twitter Weather Wikipedia World Bank Yahoo! BOSS Yahoo! Finance Yahoo! Maps Yahoo! Weather Yelp YouTube Zemanta Zillow

Files (multitude of file formats and MIME types, including):

audio (general) BibJSON BibTEX and others BitTorrent commON CSV Fink Flat files irJSON irXML JPEG JSON images MS Office OpenOffice Open Document Format Palm RDF123 video XLS etc.

Metadata extractors: CRW DEB EXIF OCW RPM XMP

Email formats: EMail Outlook RFC822

Version control and related systems: Bugzilla Jira POM Subversion

Other Web service frameworks: BPEL WSDL XBRL XBEL

Data exchange formats: iCalendar LDIF vCalendar vCard

Relational databases and related: D2RQ D2RMAP RDF Views

Virtuoso VADs OpenLink license files Third party metadata extraction frameworks:

Aperture Spotlight

Miscellaneous and other related converters: MPEG-7/CS → OWL Random XSD → OWL

*GRDDL (Gleaning Resource Descriptions from Dialects of Languages) accommodates a wide variety of dialects (see one listing) and can be combined with arbitrary transformation mechanisms (though currently mostly based on XSLTs).

Page 18: Structured Dynamics' Semantic Technologies Product Stack

18

scones

Page 19: Structured Dynamics' Semantic Technologies Product Stack

19

Information Extraction

scones (Subject Concept Or Named EntitieS) is our IE tagger

Information extraction is applied to input Web pages and unstructured text

May be applied after structure extraction:

(often, at minimum, defluffing)

Settable “window” for snippet (from # of bracketing terms to full document)

Extraction is performed for both: Entities (per Wikipedia and enterprise dictionaries) Subject concepts (per UMBEL and domain ontologies)

Presently in prototype

Page 20: Structured Dynamics' Semantic Technologies Product Stack

20

(Named) Entities

The places, events, people, objects, and specific things of the real world

Literally millions of notable instances

Each belongs to one or more subject concept(s)

Currently, the predominate basis for linked data

Public sources include Wikipedia and Freebase, others

Can be readily mixed-and-matched with private entities

Page 21: Structured Dynamics' Semantic Technologies Product Stack

21

Creating New Entity Dictionaries

Page 22: Structured Dynamics' Semantic Technologies Product Stack

22

Triangulating Information Extraction

Page 23: Structured Dynamics' Semantic Technologies Product Stack

23

irON – instance record and Object Notation

Page 24: Structured Dynamics' Semantic Technologies Product Stack

24

irON Dataset Authoring Framework

Simple authoring and dataset creation

irON includes an abstract notation and vocabulary for instance records

Serializations available for: XML (irXML) JSON (irJSON) CSV/spreadsheets (commON)

Notations for: Instance records Schema Datasets and metadata Linkages to other schema

Page 25: Structured Dynamics' Semantic Technologies Product Stack

25

Three irON SerializationsirXML irJSON

commON

Page 26: Structured Dynamics' Semantic Technologies Product Stack

26

More-or-less Interchangeable Formats

Page 27: Structured Dynamics' Semantic Technologies Product Stack

27

structWSF

Page 28: Structured Dynamics' Semantic Technologies Product Stack

28

structWSF

Generally RESTful Web services middleware

Uniform, distributed access point

Provides the interoperability architecture

Based on canonical RDF data model

Dataset access orientation

Standard tools and services: User permissions and access CRUD (create, read, update, delete) Browse Full-text, faceted search Import / export Many others

Page 29: Structured Dynamics' Semantic Technologies Product Stack

29

RDF and Data Federation Model

Page 30: Structured Dynamics' Semantic Technologies Product Stack

30

Advantages of a Canonical Model

All tools can be driven from a single data format basis

Single converters can link in other hubs of data forms

‘Round-tripping’ thru the canonical form can bring consistency and cleanliness to inputted data

RDF is well-suited as the canonical form: Structured data Semi-structured data Unstructured data (after IE) Simple-to-complex data structures Logic and inferencing Suitable to all input data formats Many serializations possible

Page 31: Structured Dynamics' Semantic Technologies Product Stack

31

A Collaborative, Distributed Network

Page 32: Structured Dynamics' Semantic Technologies Product Stack

32

Flexible User Access Permissions

Page 33: Structured Dynamics' Semantic Technologies Product Stack

33

Access, APIs and Endpoints

The resulting linked data may be exposed as:

APIs

Web services

SPARQL endpoints

Page 34: Structured Dynamics' Semantic Technologies Product Stack

34

Ontologies Layer

Page 35: Structured Dynamics' Semantic Technologies Product Stack

35

Ontologies

Ontologies provide the basis for: Interoperating Reconciling semantics

Multiples may be used at any time

Both enterprise (internal) and external ontologies

Best built incrementally, with participation

Easily modified: OK to test and experiment

Page 36: Structured Dynamics' Semantic Technologies Product Stack

36

Ontologies

The structural relationships of concepts within a domain

Generally class- (or set-) oriented

Analogous to relational database schema, only with controlled vocabularies and exact semantics

Sets the structure of how to organize the actual data (“instances”) in the domain

Semantics and mapping techniques allow disparate ontologies to be inter-related

Can inference or reason over the structure

Page 37: Structured Dynamics' Semantic Technologies Product Stack

37

Migrating Structure to the Ontology Layer

Page 38: Structured Dynamics' Semantic Technologies Product Stack

38

Ontologies Layer

Page 39: Structured Dynamics' Semantic Technologies Product Stack

39

irON

Page 40: Structured Dynamics' Semantic Technologies Product Stack

40

irON Record Vocabulary

irON also provides the standard instance record vocabulary for all federated records

Each record source has its own attributes

But, irON provides common descriptors: Useful for interoperating Unique, Web-accessible identifiers Standard descriptions and labels Conventions for “driving” user interfaces and tools

Page 41: Structured Dynamics' Semantic Technologies Product Stack

41

UMBEL

UMBEL (Upper Mapping and Binding Exchange Layer)

20,000 defined reference points in information space

Means to assert what a given chunk of content is about

Enable similar content to be aggregated

Place content in context with other content

Aggregation points for tying in instances and entities

Derived and a subset of the Cyc knowledge base

Vocabulary basis for domain-specific subject ontologies

Page 42: Structured Dynamics' Semantic Technologies Product Stack

42

Notable Ontologies and Vocabularies

Page 43: Structured Dynamics' Semantic Technologies Product Stack

43

Management Layer

Page 44: Structured Dynamics' Semantic Technologies Product Stack

44

Management/Federation Layer

Management/Federation Layer handles: Ontology mapping, management Queries and retrievals All Web services Imports and exports Inferencing and logic Ontology creation and expansion

Works off of many RDF datastores

Has efficient, full-text indexing with faceting

Interface to the system is structWSF

Can plug into many options at the Applications Layer (only Drupal with conStruct SCS yet deployed)

Page 45: Structured Dynamics' Semantic Technologies Product Stack

45

Web-oriented Architecture

Page 46: Structured Dynamics' Semantic Technologies Product Stack

46

Applications Layer

Page 47: Structured Dynamics' Semantic Technologies Product Stack

47

conStruct SCS

Page 48: Structured Dynamics' Semantic Technologies Product Stack

48

conStruct Browse Screen

Page 49: Structured Dynamics' Semantic Technologies Product Stack

49

conStruct Capabilities

Based on Drupal

Single-click (cloud) deployment

Theming

User and group access and management

Data display templates

General content management system (CMS)

Publishing RDF

Open source

Page 50: Structured Dynamics' Semantic Technologies Product Stack

50

Re-cap

Page 51: Structured Dynamics' Semantic Technologies Product Stack

51

Summary

Incremental, low-risk approach to the semantic enterprise

Maximum leverage and re-use of existing information assets

Conversion and federation of all available data forms

Excellent uses for: Business intelligence Knowledge management Master data management modernization Taxonomy modernization Enterprise content integration

All baseline products are open source

Page 52: Structured Dynamics' Semantic Technologies Product Stack

52

Contacts & InformationMichael K. Bergman

CEO

319.621.5225

[email protected]

blog: www.mkbergman.com

Steve ArdireSenior Advisor

[email protected]

Frédérick GiassonCTO

[email protected]

blog: fgiasson.com/blog

Web Sitesstructureddynamics.com

umbel.org

umbel.structureddynamics.com (UMBEL Web services)

citizen-dan.org (community indicator systems)

openstructs.org (open source distros + documentation)

constructscs.com (Drupal structured data system)

Page 53: Structured Dynamics' Semantic Technologies Product Stack