2003 april 151 data centres: connecting to the real world clive page
TRANSCRIPT
2003 April 15 1
Data Centres:Connecting to the Real World
Clive Page
2003 April 15 2
Aim
To provide Web Services interfaces to a few data centres to provide realistic data for the current iteration.
How data centres many? Maybe three…
2003 April 15 3
XML Formats for Output
Images Nothing yet
Time series Nothing yet
Spectra Nothing yet
Tables VOTable
Conclusion:
in this iteration just handle tabular datasets.
2003 April 15 4
XML Formats for Input• AQL (Astronomical Query Language): much discussion,
not much progress.• ASU (Astronomical Server URL): ad-hoc pre-XML
definition of CGI parameters by CDS, used since 1996 by a number of archives.
• SIAP (Simple Image Access Protocol) – CGI parameters defined by NVO for their prototypes in 2002.
• Xpath: draft standard for querying XML documents – designed for tree-structured data.
• Xquery: based on Xpath, includes WHERE section similar to SQL, more suitable for tabular data.
• SQL: standard for RDBMS but only used by some astronomical archives
2003 April 15 5
Query Language: proposal for ad-hoc solution
Use simplified form of SQL in an XML wrapper:
SELECT <list of columns>
FROM <list of tables>
WHERE <selection expression>
• <list of columns> includes UCDs so generic queries possible
• <list of tables> allows same query to be sent to >1 archive
• <selection expression> may includes column/UCD names and the usual syntax of relational expressions
• Need special provision for cone-search in selection, e.g.
– Boolean pseudo-function CONE(RA, DEC, RADIUS)• No joins, no sub-selects, no sorting or grouping, at present.
2003 April 15 6
Possible Datasets
Dataset Location System/DBMS Input Output
APM catalogue Cambridge Solaris/Sybase CGI VOTable
6df galaxy survey
Edinburgh Win-NT/SQL server ? VOTable nearly done
USNO-B Leicester Linux/DB2 ? XML
STP datasets RAL Various/home grown ? VOTable
USNO-B Leicester
(LEDAS)
Solaris/WCStools CGI VOtable
SuperCOSMOS Edinburgh Win-NT/SQL server ? VOTable
in progress
Vizier collection
Leicester Linux/Sybase ASU VOTable
2003 April 15 7
Problem Areas
• All current services are synchronous: user waits while HTML is generated and streamed to the browser.
– How to set up an asynchronous service where results appear later, and are sent to MySpace or elsewhere?
• How is the query generated in XML format?
• How is the query in pseudo-SQL parsed into the CGI parameters or SQL the local DBMS needs?
2003 April 15 8
Metadata Problems
• How does the query system know which column names exist, or how to translate UCDs to columns?
– It gets the information in the Registry
• How does the Registry get its information on columns and UCDs in each table? Answer: either
– It gets the information from the Data Service
– OR it gets filled in laboriously by hand.
Conclusion
• Data centres must implement a Web Service which responds to queries about their metadata
• AQL must be extended to deal with these queries.