information integration with xml, part ii · pdf fileinformation integration with xml part ii...
TRANSCRIPT
![Page 1: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/1.jpg)
1
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
Information Integration with XMLPART II
Chaitan BaruRichard Marciano
{baru,marciano}@sdsc.edu
Data Intensive Computing GroupSan Diego Supercomputer Center
![Page 2: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/2.jpg)
2
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
PART II
• Storing XML documents• Querying XML documents• XML and GIS• Technical Issues• Projects at SDSC
![Page 3: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/3.jpg)
3
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
Storing XML documents
• Pure XML data servers• Documents are stored in native XML form• XML-based query languages are used to
retrieve data• Relational DBMS’s
• Documents are stored as BLOB’s• Or, XML elements are mapped to columns in
tables• SQL is used to retrieve data
![Page 4: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/4.jpg)
4
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
Storing in “pure” XML data servers• eXcelon, from eXcelon Corp. (ex ODI)• Dynamic Application Platform
• Data Server• Toolbox• Xconnects
• B2B Integration Services• B2B Translator• Business Process Workflow Engine• Enterprise Connectivity• Business Module eXtensions
![Page 5: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/5.jpg)
5
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
eXcelon• Stores XML and non-XML (blob) data• Supports queries and indexes on stored XML
data• Uses a file system metaphor• Supports the use of (server-side) XSL
stylesheets• Provides visual tools (Studio, Explorer, Manager, Stylus)
• Provides Web & COM client interfaces• Provides Java & COM APIs to extend data server• Supports DOM for data access on the server• Can distribute XML data access across caches• Connects to 70 sources using ADBC / ADO
![Page 6: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/6.jpg)
6
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
Studio:• define XML schemas (DCD)
• generate XML-based pages
Explorer:• browser to view, import, organize, modify, query and set security on data
• Xpath/ XQL query wizard
Manager:• administer & configure
• set server properties
• set load balancing parameters
Stylus:• Build Web pages using XML & XSL
• Transforms XML to HTML
eXcelon Tool Box
![Page 7: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/7.jpg)
7
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
Connect to any data source:
• Cobol
• dBaseIII
• Act
• etc.
eXcelon Xconnects
![Page 8: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/8.jpg)
8
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
Programming against eXcelon• Out-of-the-box tools: “no writing code”
• create / update / delete / query XML
• Programming:• In eXcelon server extensions
• COM / JAVA & DOM to manipulate XML contained in eXcelon XMLStores
• In Web server• Active Server Application that uses the eXcelon COM client API & ship
HTML to the browser. XSL can be applied in the context of the Web server
• In Browser• DHTML (VBScript, JavaScript, Visual Basic) or Java applet that
manipulates XML. Apply XSL stylesheet in the browser
![Page 9: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/9.jpg)
9
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
Storing XML documents in RDBMS’s
• Store documents as BLOBs• Map document elements into a set of
relational tables• Need a DTD or schema for documents• Need to map the XML DTD or schema into a relational
schema• Relational schema will capture the hierarchical
“containment” relationship among elements as 1-1 or 1-many relationships
![Page 10: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/10.jpg)
10
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
Example of an XML document and DTDPublication
Title AuthorName
XML Tutorial Richard Marciano
AuthorName
Chaitan Baru
Abstract Section
Heading Para Para
IntroDTD<!ELEMENT Publications (Publication)*><!ELEMENT Publication (Title, AuthorName+, Abstract, Section*)><!ELEMENT Section (Heading, Paragraph*))>
Pub_ID
<!ATTLIST Publication Pub_ID ID #REQUIRED>
![Page 11: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/11.jpg)
11
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
Store data as BLOB’s in RDBMS
• Store XML document as BLOB, with text/path indexes
XML Document<title></title>
<abstract></abstract>RDBMS
textblob
textindex
![Page 12: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/12.jpg)
12
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
Provide indexing of XML documents
XML Document<title></title>
<abstract></abstract>RDBMS text
blob
Title
textindex
Column index
• Store specified elements as columns in a table
![Page 13: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/13.jpg)
13
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
Map DTD to a relational schema
• Un-nest the DTD hierarchy• Stop at a point where it is “sufficient” to
represent an element as a single compound value, rather than a hierarchy (e.g. Address)
Pub_ID Title Abstract Auth_ID Pub_ID AuthName
Sec_Num Pub_ID Heading Sec_Num Pub_ID Para_Num Text
Publication Author
Section Paragraph
![Page 14: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/14.jpg)
14
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
Storing un-nested DTD hierarchies
• Store document elements across multiple tables• (Not yet available in COTS products)
XML Document<title> </title>
<author></author><author></author>
<abstract></abstract>
RDBMS
Pub_ID Title Abstract
Publication
Auth_ID Pub_ID AuthName
Author
![Page 15: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/15.jpg)
15
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
Retrieving XML data from DBMS
• Retrieving from pure XML data servers• Use XML query languages, e.g. XQL
• Retrieving from RDBMS• Use SQL to query data from database tables• “Wrap” output of SQL query as an XML document• Define XML views over relational schemas – Xviews
• Use SQL statement(s) to create XML output
![Page 16: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/16.jpg)
16
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
Retrieving from “pure” XML servers• XML Query Language (XQL), supported by
eXcelon• Reference
• http://www.w3.org/TandS/QL/QL98/pp/xql.html• Example: Publication DTD
<ELEMENT Publications (Publication)*><ELEMENT Publication (Title, AuthorName+, Abstract, Section*)><ELEMENT Section (Heading,Paragraph*))>
PublicationsPublication
(Title, AuthorName+, Abstract, Section*(Heading, Paragraph*))
![Page 17: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/17.jpg)
17
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
XQL• Example queries:• Output all section headings of all publications
/Publications/Publication/Section/Heading
• Output all documents that have a section called, “Conclusion”
/Publications/Publication[Section/Heading=“Conclusion”]
![Page 18: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/18.jpg)
18
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
XML-QL• XML Query Language (XML-QL)
• http://www.w3.org/TR/1998/NOTE-xml-ql-19980819/• XML-QL example
WHERE <Publications><Publication>
<Title> XML Tutorial </Title><Section> $S </Section><AuthorName> $A </AuthorName>
<Publication></Publications> IN www.sdsc.edu/publications/pubs.xml”
CONSTRUCT $A• Meaning: list all authors of all publications with
title=“XML Tutorial” that have at least one section and one author
![Page 19: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/19.jpg)
19
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
XMAS• XML Matching And Structuring (XMAS)
CONSTRUCT <my_authors><my_author> $A </my_author>
</my_authors>WHERE<Publications>
<Publication> <Title> $T </Title><Section> $S </Section><AuthorName> $A </AuthorName>
</Publication> </Publications>IN "http://www.sdsc.edu/publications/pubs.xml” AND substr(”XML Tutorial", $T)
![Page 20: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/20.jpg)
20
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
Retrieving XML from RDBMS servers• Wrapping SQL output in XML• Example query:
SELECT Title, AuthNameFROM Publication, AuthorWHERE Publication.Pub_ID = Author.Pub_ID
• Result:<result>
<row><title> XML Tutorial </title><author>Marciano</author>
</row><row>
<title> XML Tutorial </title><author>Baru</author>
</row></result>
![Page 21: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/21.jpg)
21
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
Retrieving from RDBMS Servers - 2
•Define XML views over relational schemas• How to interpret relational data as XML documents• Relational schemas are “flat”, XML documents are hierarchical
Database
Relations
Tuples
Attributes
PublicationsDB
Publication
t1 t2 t3
Title Author Abstract
![Page 22: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/22.jpg)
22
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
The Xviews concept• Derive a DTD by “identifying” a “containment”
relationship among the set of tables• Example: the “canonical” data warehouse
schema
Lineitem
Region Product
Customer Candidate containment:Lineitem
Customer Product Region
• DTD<ELEMENT Lineitem (Customer,Product,Region)>
![Page 23: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/23.jpg)
23
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
The Xviews concept
• Alternative containment:Customer
Lineitem Product Region
• DTD<ELEMENT Customer (Lineitem*)><ELEMENT Lineitem (Product,Region)>
• Note: outer joins are needed in order to output customers who have no lineitems
![Page 24: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/24.jpg)
24
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
RDBMS product support
XML Document RDBMS
Database tables
Package the query output into XML
SQL queries
![Page 25: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/25.jpg)
25
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
Oracle: XML SQL and XSQL Utility
• Retrieving data as XML• Generates an XML Document from SQL queries• Outputs text or Document Object Model from a SQL query
string or a JDBC ResultSet object• Inserting XML data into tables
• Writes data from an XML document into a (single) database table or (updateable) view
![Page 26: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/26.jpg)
26
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
XSQL Utility/Servlet
Oracle 8i
Web Server
Web browser
XSQL Servlet
XML-formatted SQL queries (.xsql)
Query result in XML, or transformed into HTML by XSL
{xsql filename, params, XSL stylesheet}
XSLTprocessor
XMLSQL
utility
![Page 27: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/27.jpg)
27
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
XSQL Example
<?xml version=“1.0”?><?xml-stylesheet type=“text/xsl”
href=“query1.xsl”?><query connection = “PublicationDB”
<doc-element = “Publications”<row-element = “Publication”>SELECT title, abstract, authornameFROM publication p, author aWHERE p.Pub_ID = a.Pub_ID
</query>
<Publications><Publication>
<title>XML Tutorial</title><abstract>...</abstract><authorname>Marciano </authorname>
</Publication><Publication>
<title>XML Tutorial</title><abstract>...</abstract><authorname>Baru</authorname>
</Publication>..... more rows...
</Publications>
Example XSQL file Sample XML output
![Page 28: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/28.jpg)
28
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
DB2: XML Extender• XML Extender in UDB Version 6.1• XML_Column type and Document Access
Definitions (DAD’s)• Insertion into a column of type
XML_Column triggers extraction of elements specified in DAD’s
RDBMS
Title XML blob
DADXML Column
XML Document<title> </title>
<abstract></abstract>
![Page 29: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/29.jpg)
29
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
GIS and XML
• Represent GIS metadata in XML• Represent spatial features in XML
![Page 30: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/30.jpg)
30
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
GIS & XML: 1st experiment (the data)
![Page 31: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/31.jpg)
31
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
GIS & XML: 1st experiment (XML wrapping)
![Page 32: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/32.jpg)
32
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
Exporting GIS data in XML
![Page 33: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/33.jpg)
33
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
Work in ProgressSpatial XML Markup Languages
• Geography Markup Language (GML) 1.0• OGC Working Draft 17-Jan-2000
• Web Mapping Testbed (WMT): NIMA, USACoE, FGDC, NASA, USDA, USGS ...
• Digital Earth (www.digitalearth.gov)
• AXL (ArcXML) pre-release• part of ESRI ArcIMS 31-Jan-2000
![Page 34: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/34.jpg)
34
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
GML• GML: XML specification to encode geo. info.
For both Data Storage & Data Transport• Initial release deals with OGC Simple Features:
• vector geodata: e.g. digital map info (streets, population, land use zones, property lines, watersheds, etc.)
• GML is not concerned with the visualization of geographic features (drawing of maps)
GMLin XML
Direct rendering Graphicformat
Transformation into a vector graphics rendering format
• SVG• VML• VRML
Direct routing w.o. viz. Numerical model
![Page 35: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/35.jpg)
35
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
GML•• GoalGoal: enable organizations to share geo. info. & to
enable linked geographic datasets• When GML data is exchanged over the Internet, it is
transmitted in “feature collection”• GML Simple features:
• geometry classes: Point, LineString, Polygon• geometry properties: coordinate lists, spatial reference system name
• pointproperty• linestringproperty• polygonproperty• multipointproperty• multilineproperty• multipolygonproperty
![Page 36: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/36.jpg)
36
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
GML Example• <?xml version="1.0" standalone="yes"?>
<!DOCTYPE FeatureCollection SYSTEM "FeatureCollection.dtd" [ <!--Description : Illustration of an area feature using the polygon property. Author : Ron Lake --> ]> <FeatureCollection xmlns:ogcgml="http://www.opengis.org/gml#" >
<BoundingBox> <coordinates>0.0,0.0 3.0,4.0</coordinates>
</BoundingBox> <Feature typeName="http://www.usgs.org/tp#Building" ID="1">
<Description>Hotel Vancouver</Description> <Property typeName="http://www.usgs.org/tp#Number of Rooms" type="int">4</Property> <polygonproperty parseType = "Resource" roleName="http://www.usgs.org/tp#extent"
srsName="http://www.opengis.org/srs/epsg:26751" > <type resource = "http://www.opengis.org/gml#Polygon" /> <boundary parseType = "Resource">
<type resource = "http://www.opengis.org/gml#LineString" /> <coordinates>0.0,0.0 1.123,1.56 2.34,4.5 0.0,0.0</coordinates>
</boundary> </polygonproperty>
</Feature>
</FeatureCollection>
![Page 37: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/37.jpg)
37
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
ArcXML (AXL)
• Being developed by ESRI, available in ArcIMS 3.0• Format for data exchange within ArcIMS 3.0• Provides tags for:
• Request / Response between Client, Middleware, and Server
• MapService Configuration• Viewer Configuration
![Page 38: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/38.jpg)
38
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
AXL• Config tags:• Properties (properties, extent, background, imagesize, output, featurecoordsys, etc.)
• Workspaces (workspaces, SDEworkspaces, shapeworkspaces, imageworkspaces, etc.)
• Layers (layer, dataset, query, coordsys)• Renderers (simple, group, scaledependent, valuemap, simplelabel, valuemap, etc.)• Symbols (simplemarker, rastermarker, simpleline, hashline, simplefill, simplepolygon, rasterfill,
gradientfill, text, etc.)
• Acetate layer objects (object, point, line, polygon, text, scalebar, northarrow)
• Admin tags: (admin, addservice, changeservice, removeservice, image)
• Request tags: • (request, get_service_info, get_map, get_features, get_extract, get_geocode)• Feature Server Request Tags (layer, query, spatialquery, spatialfilter, envelope)
• Response tags: (response, error)
• serviceinfo (serviceinfo, layerinfo, fclass, field)• featureserver• queryserver (features, feature)
• imageserver (map, output, legend)
![Page 39: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/39.jpg)
39
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
Sample Request/Response AXL
• Example—Get_Map
<ARCXML VERSION="1.0">
<REQUEST>
<GET_MAP>
<PROPERTIES>
<EXTENT MINX="-180" MINY="-90" MAXX="180" MAXY="90" />
</PROPERTIES>
</GET_MAP>
</REQUEST>
</ARCXML>
![Page 40: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/40.jpg)
40
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
Sample MapService AXL
• Example<WORKSPACES>
<SHAPEWORKSPACE name="shp_ws-0”
directory="D:\Data\ESRIDATA\USA" />
<SDEWORKSPACE name="sde_ws-0"
server="ims" instance="esri_sde"
user="gdt" password="gdt" />
</WORKSPACES>
![Page 41: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/41.jpg)
41
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
Sample Viewer AXL
• Example<WORKSPACES>
<MAPPERWORKSPACE name="mapper_ws-0”
url="http://mammoth" service="baseimage" />
</WORKSPACES>
<LAYER type="image" name="baseimage" visible="false"
minscale="0.0” maxscale="1.7976931348623157E308”/>
<DATASET name="baseimage" type="image”
workspace="mapper_ws-0" />
</LAYER>
![Page 42: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/42.jpg)
42
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
![Page 43: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/43.jpg)
43
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
![Page 44: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/44.jpg)
44
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
![Page 45: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/45.jpg)
45
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
![Page 46: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/46.jpg)
46
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
Open Technical Issues
• DTD inference• DTD evolution• Specifying access controls on XML
documents• Specifying, enforcing intr-document
constraints
![Page 47: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/47.jpg)
47
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
DTD Inference
• Document collections without DTD’s• “Tight” vs. “loose” DTD’s• Document 1:
<title> XML Tutorial </title><author> Richard Marciano </author>
• Possible DTD<ELEMENT document (title author))>
![Page 48: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/48.jpg)
48
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
DTD Inference• Document 2:
<title> XML Tutorial </title><author> Richard Marciano </author><author> Chaitan Baru </author>
• Document DTD 1<!ELEMENT document
(title (author1 | author1 author2))>• Document DTD 2
<!ELEMENT document (title (author+))>
![Page 49: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/49.jpg)
49
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
DTD Inference
• Alternative DTD’s. Introduce an extra level (authors) in the tree<!ELEMENT document (title authors)><!ELEMENT authors (author1 |
author1 author2)>OR<!ELEMENT document (title authors)><!ELEMENT authors (author+) >
![Page 50: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/50.jpg)
50
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
DTD Evolution
• Example, Document 3:<title> XML Primer </title><author> Richard Marciano </author><author> Chaitan Baru </author><keywords> XML, XSL, Schema </keywords>
• Document does not satisfy the Document DTD• Report an error• Record as exception and store the document• Evolve the DTD
![Page 51: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/51.jpg)
51
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
Specifying access controls• User is associated with a level of access• Document elements are assigned levels of
access• Example
<abstract level=“unclassified”>….</abstract><section level=“classified”><heading>Introduction
</heading></section><section level=“top secret”><heading>Architecture
</heading></section>• Stylesheet processor matches authorization level of user
with auth level of the document element
![Page 52: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/52.jpg)
52
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
Specifying access controls
• Do access control processing in the stylesheet language
• Useful for content dependent access control• Example
If title contains “nuclear” then show only abstract Else show the full document
• Access control processing should be done on server side in secure fashion
![Page 53: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/53.jpg)
53
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
Specifying constraints
• Enforcing intra-document constraints• Constraints on structure• Example: A short paper may contain only one
section, but long papers must have at least two.<!ELEMENT Publication (Title, AuthorName*, Section*)<!ATTLIST Publication Type CDATA #REQUIRED>
• Specify type of document in Type attribute. Use that to check if document satisfies the constraint
• Constraints on value
![Page 54: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/54.jpg)
54
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
Specifying constraints
• Example of value constraint<!ELEMENT Publication (Title, AuthorName*, Section*)<!ATTLIST Publication NumSecs CDATA #REQUIRED><Publication NumSecs=“3”>
<Title>…</Title><AuthorName>…</AuthorName><Section>……</Section><Section>……</Section><Section>……</Section>
</Publication>
![Page 55: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/55.jpg)
55
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
Projects at SDSC• National Archives and Records Administration,
NARA• Persistent Archives and Electronic Records
• NHPRC• NPACI Neuroscience
• Federation of multiple brain image databases• I2T: An Information Integration Testbed for
Digital Government• Funded by NSF• Spatial mediation, wrapping of “unstructured” text
![Page 56: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/56.jpg)
56
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
Projects at SDSC
• InterLib and California Digital Library• Funded by the NSF DLI-2 program• Implemented the Art Museum Image Consortium (AMICO)
Digital Library at SDSC• Community of Science, Inc. (www.cos.com)
• Specifying XML standards for Current Research Information Systems (CRIS)
• Enable creation of warehouse of research information and enable e-commerce
![Page 57: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/57.jpg)
57
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
Projects at SDSC
• ESRI• Developers of ArcInfo, ArcView, ArcIMS products• Evaluate ArcXML (AXL) standard• Keep AXL developments in synch with activities in
OpenGIS Consortium, e.g. the evolving Geography Markup Language (GML) standard
• Connect AXL with other XML Web standards such as WAP (Wireless Application Protocol)
![Page 58: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/58.jpg)
58
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure
Projects at SDSC
• NEES• Proposal to NSF’s Networked Earthquake Engineering
Simulation (NEES) program• Develop NeesML, an XML-based standard for
representing earthquake engineering simulation metadataand data
• NeesML will facilitate the creation of a NEES Curated Database, a warehouse of earthquake engineering simulation information
![Page 59: Information Integration with XML, Part II · PDF fileInformation Integration with XML PART II Chaitan Baru Richard Marciano {baru,marciano}@sdsc.edu Data Intensive Computing Group](https://reader030.vdocuments.site/reader030/viewer/2022020204/5a78d5b97f8b9a4f1b8d02d0/html5/thumbnails/59.jpg)
59