the social data web

46
transparency, collaboration and information sharing solution architecture tools and techniques using the social data web george thomas, 1105 ea2009

Upload: george-thomas

Post on 11-May-2015

3.023 views

Category:

Education


0 download

DESCRIPTION

This presentation is the culmination of my detail to the E-Government Office in the US Office of Management and Budget and the work I did to evolve and mature initiatives like recovery.gov and data.gov.

TRANSCRIPT

Page 1: The Social Data Web

transparency, collaboration and information sharing

solution architecture tools and techniques using the social data web

george thomas, 1105 ea2009

Page 2: The Social Data Web

agenda• An overview of Web Oriented Architecture (WOA) design principles that

have made the Web the most successful distributed computing platform ever created will be given.

• Technologies for exposing raw data and publishing semantically enriched structured data for persistence and syndication on the Web as public records will be described.

• Technologies that enable interoperability across these published assets and currently disparate data sources to achieve low cost, large scale data federation will be described.

• Widgets and services that consume and transform this data for interactive and integration purposes will be discussed in the context of different stakeholder views.

• A Web-scale approach to Business Intelligence leveraging Cloud Computing approaches to data archive analysis will be described.

• Finally, the applicability of the proposed solution architecture to the Federal Segment Architecture Methodology and tools like Visualization to Understand Expenditures in IT will be discussed.

Page 3: The Social Data Web

agenda• An overview of Web Oriented Architecture (WOA) design principles that

have made the Web the most successful distributed computing platform ever created will be given.

• Technologies for exposing raw data and publishing semantically enriched structured data for persistence and syndication on the Web as public records will be described.

• Technologies that enable interoperability across these published assets and currently disparate data sources to achieve low cost, large scale data federation will be described.

• Widgets and services that consume and transform this data for interactive and integration purposes will be discussed in the context of different stakeholder views.

• A Web-scale approach to Business Intelligence leveraging Cloud Computing approaches to data archive analysis will be described.

• Finally, the applicability of the proposed solution architecture to the Federal Segment Architecture Methodology and tools like Visualization to Understand Expenditures in IT will be discussed.

Page 4: The Social Data Web

Web Oriented Architecture (WOA)• REpresentational State Transfer (REST)

– The architectural style of the World Wide Web– aka Resource Oriented Architecture (ROA)

• hyperlinks dereference (information) resource representations– HTTP URI's and content negotiation

• user agent prefers .htm, .xml, .rdf, .etc

• statefulness– servers maintain resource state, clients maintain application state

• RESTful Web services– HTTP uniform interface

• CRUD analog to HTTP PUT/GET/POST/DELETE– contrast to Remote Procedure Call (RPC) style Web services

• SOAP/WSDL, you design the methods to invoke

• global visibility (the Web) and persistence (permalinks)– caching, crawling, indexing

Page 5: The Social Data Web

agenda• An overview of Web Oriented Architecture (WOA) design principles that

have made the Web the most successful distributed computing platform ever created will be given.

• Technologies for exposing raw data and publishing semantically enriched structured data for persistence and syndication on the Web as public records will be described.

• Technologies that enable interoperability across these published assets and currently disparate data sources to achieve low cost, large scale data federation will be described.

• Widgets and services that consume and transform this data for interactive and integration purposes will be discussed in the context of different stakeholder views.

• A Web-scale approach to Business Intelligence leveraging Cloud Computing approaches to data archive analysis will be described.

• Finally, the applicability of the proposed solution architecture to the Federal Segment Architecture Methodology and tools like Visualization to Understand Expenditures in IT will be discussed.

Page 6: The Social Data Web

XForms - human data capture• Orbeon server side XForms engine, Ajax browser GUI's

• catalog and builder apps• create new XSD bound forms• populate, persist, search• Tomcat and eXist• off-line capability• transformation pipeline

Page 7: The Social Data Web

Atom Publishing Protocol (APP)• automated invocation of the RESTful Web service

– HTTP PUT/POST the spreadsheet or XML instance doc• to atomserver.codehaus.org

• where else is APP used?– Google Data API's, Microsoft Live Framework

Page 8: The Social Data Web

Atom Syndication Format• transform XForm or APP captured info into XHTML+RDFa • (permalinked) public recordset in feed entry <content>

Page 9: The Social Data Web

the london-gazette.co.uk

Page 10: The Social Data Web

london-gazette.co.uk/listing

small, discreet, component ontology/data-domain-metamodels

Page 11: The Social Data Web

web page = web service

Page 12: The Social Data Web

RDFa enabled 'deep link' discovery• Rich Snippets from Google

• SearchMonkey from Yahoo

Page 13: The Social Data Web

agenda• An overview of Web Oriented Architecture (WOA) design principles that

have made the Web the most successful distributed computing platform ever created will be given.

• Technologies for exposing raw data and publishing semantically enriched structured data for persistence and syndication on the Web as public records will be described.

• Technologies that enable interoperability across these published assets and currently disparate data sources to achieve low cost, large scale data federation will be described.

• Widgets and services that consume and transform this data for interactive and integration purposes will be discussed in the context of different stakeholder views.

• A Web-scale approach to Business Intelligence leveraging Cloud Computing approaches to data archive analysis will be described.

• Finally, the applicability of the proposed solution architecture to the Federal Segment Architecture Methodology and tools like Visualization to Understand Expenditures in IT will be discussed.

Page 14: The Social Data Web

goal: federated dataset correlation• graph based dynamic schema evolution across silos

– centralization/normalization not required (or realistic/practical!)

Page 15: The Social Data Web

Web as DB - Web API• Linking Open (Government) Data (LOD)

• SPARQL endpoints

linkeddata.org

Page 16: The Social Data Web

browse: from web of docs to web of data

Page 17: The Social Data Web

http://data.linkedmdb.org/page/actor/10

• content negotiation, user agent prefers;– human (html) or machine (rdf/xml) readable

RDF/N3

Page 18: The Social Data Web

http://data.linkedmdb.org/page/actor/10

• now at the bottom of the same page/actor/10– triple is Subject (S) Predicate (P) Object (O)

• 10 (S) vocabulary:property (P) <object> (O)

– properties link to other dataset instances• that use different datatype definitions

– note D2R app, expose RDB as RDF, SPARQL to SQL

Page 19: The Social Data Web

http://data.linkedmdb.org/data/actor/10• <subject> has predicate {space} object1 , objectN ; repeat until .

<http://data.linkedmdb.org/resource/actor/10> foaf:page <http://www.freebase.com/view/guid/9202a8c04000641f800000000007821e> ,

<http://www.imdb.com/name/nm0000564/> ;

owl:sameAs <http://mpii.de/yago/resource/Peter_O%27Toole> , <http://dbpedia.org/resource/Peter_O%27Toole> ;

rdf:type movie:actor ,

foaf:Person .

• this is an 'N3' RDF serialization, instead of RDF/XML (or others)

• some properties have RESTful SPARQL queries as <objects>

foaf:person rdfs:seeAlso <http://data.linkedmdb.org/sparql?query=DESCRIBE+<http://xmlns.com/foaf/0.1/Person>

Page 20: The Social Data Web

Web based SPARQL query builder

http://dbpedia.org/ is powered by http://www.openlinksw.com 'Virtuoso' that provides a 'SPARQL endpoint' (DRM 'query point')

Page 21: The Social Data Web

creates dbpedia.org query

• use response data in next query

Page 22: The Social Data Web

authoritative metadata - provided tags!!• using standardized datatype and property specifications

• ontologies emerges from social folksonomy

http://commontag.org

Page 23: The Social Data Web

agenda• An overview of Web Oriented Architecture (WOA) design principles that

have made the Web the most successful distributed computing platform ever created will be given.

• Technologies for exposing raw data and publishing semantically enriched structured data for persistence and syndication on the Web as public records will be described.

• Technologies that enable interoperability across these published assets and currently disparate data sources to achieve low cost, large scale data federation will be described.

• Widgets and services that consume and transform this data for interactive and integration purposes will be discussed in the context of different stakeholder views.

• A Web-scale approach to Business Intelligence leveraging Cloud Computing approaches to data archive analysis will be described.

• Finally, the applicability of the proposed solution architecture to the Federal Segment Architecture Methodology and tools like Visualization to Understand Expenditures in IT will be discussed.

Page 24: The Social Data Web

indexing/searching the Data Web

Page 25: The Social Data Web

aggregation and live data reporting

http://sig.ma

Page 26: The Social Data Web

many to many set visualization

http://mqlx.com/~david/parallaxinterface used to aggregate data across multiple (data) 'bases' on

http://freebase.com

Page 27: The Social Data Web

ad-hoc analyst/end-user 'meshups'

Page 28: The Social Data Web

schema/bizmo/federal_enterprise

• bizmo.freebase.com = OMG BMM + CPIC (+SOA...)– Obama is an instance of the Federal Enterprise type

• Federal Enterprise (S) Fed Ent Goal (P) Goal (O)

Page 29: The Social Data Web

/rdf/bizmo.federal_enterprise (excerpt)• (W3C/FBase) <subject/topic> <predicate/property>

<object/topic> <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/type.object.name> "Federal

Enterprise"@en.

<http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/freebase.type_profile.instance_count> "1"^^<http://www.w3.org/2001/XMLSchema#long>.

<http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/type.type.instance> <http://rdf.freebase.com/ns/guid.9202a8c04000641f800000000c61962c>.

<http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/type.type.properties> <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise.federal_enterprise_strategy>.

<http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/type.type.properties> <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise.federal_enterprise_tactic>.

<http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/type.type.properties> <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise.federal_enterprise_directive>.

<http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/type.type.properties> <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise.federal_enterprise_objective>.

<http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/type.type.properties> <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise.federal_enterprise_information_technology_budget>.

<http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://rdf.freebase.com/ns/type.type.properties> <http://rdf.freebase.com/ns/base.bizmo.federal_enterprise.federal_enterprise_goal>.

<http://rdf.freebase.com/ns/base.bizmo.federal_enterprise> <http://www.w3.org/1999/xhtml/vocab#license> <http://creativecommons.org/licenses/by/3.0/>.

Page 30: The Social Data Web

connecting the data dots:• create the following subject/predicate/object or topic/property/topic

schema:

Goal / amplifies / Vision

Objective / quantifies / Goal

Federal Enterprise / (has) Fed Ent Goal / (of type) Goal

Federal Agency / maintains / Exhibit 53

Exhibit 53 / contains (multiple) / Exhibit 53 Recordset(s)

Exhibit 53 Recordset / Supports Federal Goal / (of type) Goal

• then create instances with data from http://it.usaspending.gov:

Obama / is of type / Federal Enterprise

Obama / has a Fed Ent Goal / Health Care Reform

HHS / is of type / Federal Agency

HHS / maintains / HHS Exhibit 53

HHS Exhibit 53 / contains / Nat Health Info Network Connect

Nat Health Info Network Connect / supports Obama Goal / Health Care Reform

Page 31: The Social Data Web

search all 'bases' for 'Exhibit 53'

http://mqlx.com/~david/parallax interface tohttp://bizmo.freebase.com

Page 32: The Social Data Web

base/bizmo/e53 returns

• a collection (2 instances) of an Exhibit 53 topic– one from HHS and GSA (data from it.usaspending.gov)

• triple in Exhibit 53 topic schema– Exhibit 53 (S) contains (P) Exhibit 53 Recordset (O)

Page 33: The Social Data Web

discovering unknown data structures

• the power of 'faceted' search and browsing• interactive query – which of these?

– Ex53 Recordset (S) Supports Federal Goal (P) ? (O)

Page 34: The Social Data Web

traversing the data graph

• from info about an IT investment• to info about Administration priorities

• 2 Ex53's to 3 Recordsets to 1 that has Obama Goal– <uri> (S) <uri> (P) <uri> (O)

Page 35: The Social Data Web

http://freemix.it - more faceted filtering

Page 36: The Social Data Web

scatter chart driven by tag clouds

Page 37: The Social Data Web

more multi-dataset faceted meshups

Page 38: The Social Data Web

drag & drop metadata/data 'curation'

Page 39: The Social Data Web

publish new freemix merged dataset choose a stylesheet, view lenses and facets to include for your end users to interact with

Page 40: The Social Data Web

agenda• An overview of Web Oriented Architecture (WOA) design principles that

have made the Web the most successful distributed computing platform ever created will be given.

• Technologies for exposing raw data and publishing semantically enriched structured data for persistence and syndication on the Web as public records will be described.

• Technologies that enable interoperability across these published assets and currently disparate data sources to achieve low cost, large scale data federation will be described.

• Widgets and services that consume and transform this data for interactive and integration purposes will be discussed in the context of different stakeholder views.

• A Web-scale approach to Business Intelligence leveraging Cloud Computing approaches to data archive analysis will be described.

• Finally, the applicability of the proposed solution architecture to the Federal Segment Architecture Methodology and tools like Visualization to Understand Expenditures in IT will be discussed.

Page 41: The Social Data Web

crowdsourced analyticsshown using 'Top Braid Composer Maestro' from

http://topquandrant.com

'SPARQLMotion' script – also see Yahoo | Derihttp://pipes.yahoo.com | http://pipes.deri.org

Page 42: The Social Data Web

cloud scale analytics (petabyte batch)• proprietary Google

– GFS, BigTable and MapReduce

– page rank impl• open source Apache Hadoop

– HDFS, HBase and MapReduce

– entity, RDFa extraction• Amazon EMR, Cloudera

– COSS prof service providers

facebook.com

Page 43: The Social Data Web

talis.com/platform - cloud graph store• Software as a Service, enabling rapid development with zero deployment

costs

• a simple, consistent web API for storing, managing and retrieving both structured and unstructured data

• flexible, schema-free metadata that allows applications to be easily evolved

• a range of data access and query options enabling easy integration into both new and existing applications

• access control options to support hosting of both public and private data

• a data hosting solution that is founded on open internet standards and web architectural best practices

• ...

• every resource in your (data)store has a unique URL from which its metadata can be retrieved with a single web request

• SPARQL queries can be used to perform more complex queries, retrieving results as a tabular result set or as RDF

• content negotiation can be used to retrieve data as RDF, XML, or JSON allowing you to chose the right format for your application

Page 44: The Social Data Web

agenda• An overview of Web Oriented Architecture (WOA) design principles that

have made the Web the most successful distributed computing platform ever created will be given.

• Technologies for exposing raw data and publishing semantically enriched structured data for persistence and syndication on the Web as public records will be described.

• Technologies that enable interoperability across these published assets and currently disparate data sources to achieve low cost, large scale data federation will be described.

• Widgets and services that consume and transform this data for interactive and integration purposes will be discussed in the context of different stakeholder views.

• A Web-scale approach to Business Intelligence leveraging Cloud Computing approaches to data archive analysis will be described.

• Finally, the applicability of the proposed solution architecture to the Federal Segment Architecture Methodology and tools like Visualization to Understand Expenditures in IT will be discussed.

Page 45: The Social Data Web

application to EA discipline getting there from here

– stop:• publishing / analyzing / visualizing unstructured data• using structure data only in file or message exchanges

– start:• align Gov and Web architecture (including EA KB's!)• publish component ontologies on the Web• and begin linking their metadata and data• using the Social Data Web

– continue:• embrace emergent structure and continuous improvement• using open source and enabling long-tail crowd-sourcing

Page 46: The Social Data Web

q&a - discussion• thanks for your time and attention!

• contact me

– http://xri.net/=george.thomas

– GSA OCIO Chief Enterprise Architect– FCIOC-AIC Services Subcommittee Chair– W3C eGov IG invited expert– OMG GovDTF Steering Committee– Graduate School Faculty SOA Instructor