directions… web x.0, nosql dbs and the semantic web

Directions Web X.0, NoSQL DBs and the Semantic Web

Quick review: Web development frameworks Web 2.0/3.0 is about making websites faster, smarter, more media rich and more intuitive There is a generation of web development frameworks that focus on faster and smarter, they attack the back end, not the front end Ruby on Rails, Grails, Django, Symfony, and others They tend to use some kind of relational to object mapping/wrapping They support and in fact enforce the MVC approach to developing websites Model (the database with mapping/wrapping), View (web pages), Controller (pieces of code that map view manipulation into model manipulations and vice-versa). They are AJAX friendly

Client-side web development There is another generation of web frameworks that focus on making it easy to create rich web interfaces Flash Builder (gone open source from Adobe) Silverlight (from Microsoft and perhaps dead?) They support 2D and some 3D graphics They use upfront loading to minimize interaction with server There is a newer effort involving HTML5 Graphics is supported, with 2D and some 3D Local storage with simple insert, delete, can use SQLite Better multimedia support More powerful Javascript libraries are coming out, e.g. JQuery, as well

Important to note Web X.0 efforts try to make use of graphics in interfaces, as well as provide better displaying of media But supporting blob and continuous data access is still very rudimentary (images, video, audio, etc.) Problem: we cannot screen media in real time Problem: it is very difficult to capture the semantics of media The solution: We tend to build accompanying meta databases with tag sets (one per piece of media) assigned by experts using specialized namespaces. To enhance accuracy, there is sometimes a feedback loop where users can train the search facility

Quick review: the Semantic Web This is oriented around making the web more automatically searchable Main foci: Assertions and inferences Exposing databases that contain hidden data Searching of media bases (blog and continuous), i.e., exposing them Searching document bases, i.e., exposing them Data mining

Querying the Semantic Web RDF - triples We can use URIs for all three pieces of a triple SPARQL - triples query language, used for spanning Web boundaries Example: THE BALL is ORANGE. ORANGE is an UGLY COLOR. The inference we can make is THE BALL has an UGLY COLOR

An RDF Example xmls:rdf=http://www.w3.org/1999/02/22-rdf-syntax- ns#> xmls:zx=http://www.someurl.org/zx/> funstuff http://www.yetanotherurl.org/professor

The assertions and an inference www.awebsite.org/index.html funstuff The topic of the resource at www.awebsite.org/index.html is funstuff www.awebsite.org/index.html http://www.anotherurl.org/buzz http://www.anotherurl.org/buzz www.awebsite.org/index.html was created by someone who is identified by the url http://www.anotherurl.org/buzz. We see that the value in the first triple, which concerns the topic of our resource, consists of a character string, but the value in the second triple, which concerns the created-by of our resource, is actually a URL.

SPARQL SPARQL stands for Protocol And RDF Query Language, with an S tossed into the beginning so we can say it as sparkle. It is a language that can be used to traverse graphs that consist of RDF triples that are chained together into an object network. prefix website1: SELECT ?x WHERE { website1:was-created-by ?x } This code will find the creators of http://awebsite.orghttp://awebsite.org It will search through all of these triples and find the ones of interest to us, and then pluck off the names of the creators. These triples could be distributed all around the Web

The Semantic Web, continued Main tools Namespaces posted on web and shared XML Ontologies of assertions Tall people play basketball Joe is tall (note both schema and instance based) Walking paths linked by assertions with languages like SPARQL Forming inferences from assertions along the way XML extensions to accommodate complex data and non-string data and querying of large datasets Support pointers to namespaces Support complex, non-textual documents, along with object IDs, keys and foreign keys

Continued

Accommodating complex data Schemas Initially DTDs Later XML schema Save schema fragments and import them Non-string data types Keys and FKs Type constructors Primitive integer, float, boolean, date, ID Simple list, union Complex groups of elements

Data types in XML Schema

Continued

XML schema and namespaces

XPATH for searching XML schemas hierarchically An XPath expression takes a document tree as input and returns a multi-set of nodes of the tree absolute path expressions Expressions that start with / are absolute path expressions Expression / returns root node of XPath tree /Students/StudentStudent Students /Students/Student returns all Student-elements that are children of Students elements, which in turn must be children of the root /Student /Student returns empty set (no such children at root

XPATH continued Currentcontext Current (or context node) exists during the evaluation of XPath expressions (and in other XML query languages). denotes the current node;.. denotes the parent foo/barbarfoo foo/bar returns all bar-elements that are children of foo nodes, which in turn are children of the current node./foo/bar./foo/bar same../abc/cdecdeabc../abc/cde all cde e-children of abc e-children of the parent of the current node relative Expressions that dont start with / are relative (to the current node)

Attributes, text, /Students/Student/@StudentIdStudentId StudentStudents /Students/Student/@StudentId returns all StudentId a- children of Student, which are e-children of Students, which are children of the root /Students/Student/Name/Last/text( ) /Students/Student/Name/Last/text( ) returns all t-children of Last e-children of /comment( ) /comment( ) returns comment nodes under root XPath provides means to select other document components as well

XQuery General structure: FOR variable declarations WHERE condition RETURN document Example: (: students who took MAT123 :) FOR $t IN doc(http://xyz.edu/transcript.xml)//Transcript WHERE $t/CrsTaken/@CrsCode = MAT123 RETURN $t/Student Result:

XML and Web X.0: Flash Builder

Results in

Semantic Web big problems Massive reengineering effort to make use of Semantic Web technology Assertions that span nodes can be extremely time consuming to traverse Making media accessible Easy enough to generate low level assertions automatically Very time consuming to add assertions manually by experts Our main tools are tagging and image/sound processing packages that are very complex and very heuristic driven XML Schema, the big XML extension, is unwieldy

Web X.0 big problems We are not just trying to search relational databases Graphics is often used in a gratuitous, non-useful, even distracting fashion, and they eat up download time and computational time We still cannot manipulate or search or interpret media

Comparison with NoSQL DBs Key-document and key-value databases are a way of organizing document and value (blob) and continuous databases so they can be searched quickly by next generation web applications, as well as by programs automatically searching the web Graph databases are a way of dynamically extending assertions between objects, but dont play well with large networks

Nice things about NOSQL DBS and the Semantic Web and Web X.0 NoSQL DBs are minimalistic in just the right way Much easier to plug in than complex XML Schema front ends to databases and can work with existing relational dbs Documents are natural to both efforts Media blogs are natural to both efforts Graphs are natural to the Semantic Web

Web services Supports non-interactive database access Uses XML, HTTP, etc. Examples are Google and Amazon Universal Description, Discovery, and Integration (UUID) for creating distributed registries of web services Web Services Description Language (WSDL) Simple Object Access Protocol (SOAP) is XML based and is a protocol that allows apps to send messages to each other over the Internet

Security The complexity of server-side technology, along with its heterogeneity The need to allow dynamic web page support, email, ftp, etc. The need to support services Access to databases from multiple sources on either side of the firewall

continued The tendency to loosen firewalls when things dont work Email attachments Rapid rate of change of software and content and services The use of open source and legacy dbs that are poorly understood

Another security issue Web and database servers are used to support newer sorts of data and service access Warehousing data (usually, but not always inside the firewall) Mining data, which is often outside the firewall Specialized document retrieval systems Specialized advanced media retrieval systems Integration of heterogeneous data Sharing of namespaces, schema fragments, and query code (often in XML technologies)

continued All of these can be layered and span multiple sites Such as hierarchical data marts Mediator based integration hierarchies A wide class of people, inside and outside of the organization must have access to data (such as content taggers)

Data Privacy HIPAA Authorization of users and applications Passwords Two factor (like a password or code and a physical code) Mediated (using a third party) Encryption Storage Transmission

directions… web x.0, nosql dbs and the semantic web

Documents

semantic web slide

web development frameworks

directions web

semantic web rdf

generation of web frameworks

mvc slide

view web pages

data mining slide