2003 april 151 data centres: connecting to the real world clive page

8
2003 April 15 1 Data Centres: Connecting to the Real World Clive Page

Upload: horace-glenn

Post on 27-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 2003 April 151 Data Centres: Connecting to the Real World Clive Page

2003 April 15 1

Data Centres:Connecting to the Real World

Clive Page

Page 2: 2003 April 151 Data Centres: Connecting to the Real World Clive Page

2003 April 15 2

Aim

To provide Web Services interfaces to a few data centres to provide realistic data for the current iteration.

How data centres many? Maybe three…

Page 3: 2003 April 151 Data Centres: Connecting to the Real World Clive Page

2003 April 15 3

XML Formats for Output

Images Nothing yet

Time series Nothing yet

Spectra Nothing yet

Tables VOTable

Conclusion:

in this iteration just handle tabular datasets.

Page 4: 2003 April 151 Data Centres: Connecting to the Real World Clive Page

2003 April 15 4

XML Formats for Input• AQL (Astronomical Query Language): much discussion,

not much progress.• ASU (Astronomical Server URL): ad-hoc pre-XML

definition of CGI parameters by CDS, used since 1996 by a number of archives.

• SIAP (Simple Image Access Protocol) – CGI parameters defined by NVO for their prototypes in 2002.

• Xpath: draft standard for querying XML documents – designed for tree-structured data.

• Xquery: based on Xpath, includes WHERE section similar to SQL, more suitable for tabular data.

• SQL: standard for RDBMS but only used by some astronomical archives

Page 5: 2003 April 151 Data Centres: Connecting to the Real World Clive Page

2003 April 15 5

Query Language: proposal for ad-hoc solution

Use simplified form of SQL in an XML wrapper:

SELECT <list of columns>

FROM <list of tables>

WHERE <selection expression>

• <list of columns> includes UCDs so generic queries possible

• <list of tables> allows same query to be sent to >1 archive

• <selection expression> may includes column/UCD names and the usual syntax of relational expressions

• Need special provision for cone-search in selection, e.g.

– Boolean pseudo-function CONE(RA, DEC, RADIUS)• No joins, no sub-selects, no sorting or grouping, at present.

Page 6: 2003 April 151 Data Centres: Connecting to the Real World Clive Page

2003 April 15 6

Possible Datasets

Dataset Location System/DBMS Input Output

APM catalogue Cambridge Solaris/Sybase CGI VOTable

6df galaxy survey

Edinburgh Win-NT/SQL server ? VOTable nearly done

USNO-B Leicester Linux/DB2 ? XML

STP datasets RAL Various/home grown ? VOTable

USNO-B Leicester

(LEDAS)

Solaris/WCStools CGI VOtable

SuperCOSMOS Edinburgh Win-NT/SQL server ? VOTable

in progress

Vizier collection

Leicester Linux/Sybase ASU VOTable

Page 7: 2003 April 151 Data Centres: Connecting to the Real World Clive Page

2003 April 15 7

Problem Areas

• All current services are synchronous: user waits while HTML is generated and streamed to the browser.

– How to set up an asynchronous service where results appear later, and are sent to MySpace or elsewhere?

• How is the query generated in XML format?

• How is the query in pseudo-SQL parsed into the CGI parameters or SQL the local DBMS needs?

Page 8: 2003 April 151 Data Centres: Connecting to the Real World Clive Page

2003 April 15 8

Metadata Problems

• How does the query system know which column names exist, or how to translate UCDs to columns?

– It gets the information in the Registry

• How does the Registry get its information on columns and UCDs in each table? Answer: either

– It gets the information from the Data Service

– OR it gets filled in laboriously by hand.

Conclusion

• Data centres must implement a Web Service which responds to queries about their metadata

• AQL must be extended to deal with these queries.