experience building the world wide telescope aka: the virtual observatory
DESCRIPTION
Experience Building The World Wide Telescope aka: The Virtual Observatory. Jim Gray Alex Szalay. The Evolution of Science. Observational Science Scientist gathers data by direct observation Scientist analyzes data Analytical Science Scientist builds analytical model Makes predictions. - PowerPoint PPT PresentationTRANSCRIPT
1
Experience Building The World Wide Telescope aka: The Virtual Observatory
Jim Gray
Alex Szalay
2
The Evolution of Science• Observational Science
– Scientist gathers data by direct observation– Scientist analyzes data
• Analytical Science – Scientist builds analytical model– Makes predictions.
• Computational Science – Simulate analytical model– Validate model and makes predictions
• Data Exploration Science Data captured by instrumentsOr data generated by simulator– Processed by software– Placed in a database / files– Scientist analyzes database / files
3
Information Avalanche• In science, industry, government,….
– better observational instruments and – and, better simulations producing a data avalanche
• Examples– BaBar: Grows 1TB/day
2/3 simulation Information 1/3 observational Information
– CERN: LHC will generate 1GB/s .~10 PB/y– VLBA (NRAO) generates 1GB/s today– Pixar: 100 TB/Movie
• New emphasis on informatics:– Capturing, Organizing,
Summarizing, Analyzing, Visualizing
Image courtesy C. Meneveau & A. Szalay @ JHU
BaBar, Stanford
Space Telescope
P&E Gene Sequencer Fromhttp://www.genome.uci.edu/
4
World Wide TelescopeVirtual Observatory
http://www.ivoa.net/
• Premise: Most data is (or could be online)
• The Internet is the world’s best telescope:– It has data on every part of the sky– In every measured spectral band: optical, x-ray, radio..
– As deep as the best instruments (2 years ago).
– It is up when you are up.The “seeing” is always great (no working at night, no clouds no moons no..).
– It’s a smart telescope: links objects and data to literature on them.
5
The WWT Components• Data Sources
– Literature– Archives
• Unified Definitions– Units, – Semantics/Concepts/Metrics,
Representations, – Provenance
• Object model• Classes and methods• Portals
6
Data Sources• Literature online and cross indexed
– Simbad, ADS, NED,http://simbad.u-strasbg.fr/Simbad, http://adswww.harvard.edu/, http://nedwww.ipac.caltech.edu/
• Many curated archives online– FIRST, DPOSS, 2MASS, USNO, IRAS, SDSS, VizeR,…– Typically files with English meta-data and some programs
• Groups, Researchers, Amateurs Publish– Datasets online in various formats– Documentation varies– Publications are Ephemeral – Unknown provenance
7
Unified Definitions• Universal Content Definitions
http://vizier.u-strasbg.fr/doc/UCD.htx
– Collated all table heads from all the literature– 100,000 terms reduced to ~1,500– Rough consensus that this is the right thing.– Refinement in progress as people use UCDs
• Defines – Units:
• gram, radian, second, ...
– Semantic Concepts / Metrics • Std error, Chi2 fit, magnitude, flux @ passband, velocity,
8
Provenance• Most data will be derived.• To do science,
need to trace derived data back to source.• So programs and inputs must be registered.• Must be able to re-run them.• Example: Space Telescope Calibrated Data
– Run on demand– Can specify software version (to get old answers)
• Scientific Data Provenance and Curation are largely unsolved problems (some ideas but no science).
9
Object Model• General acceptance of XML • Recent acceptance of XML Schema (XSD over DTD)
• Wait-and-See about SOAP/WSDL/…– “ Web Services are just Corba with angle brackets.”– FTP is good enough for me.
• Personal opinion:– Web Services are much more than “Corba + <>”– Huge focus on interop– Huge focus on integrated tools
• But the community says “Show me!”– Many technologists sold, but not the astronomers
10
Classes and Methods• First Class: VO table
http://www.us-vo.org/VOTable/VOTable-1-0.htm
– Represents an answer set in XML• Defined by an XML Schema (XSD) • Metadata (in terms of UCDs)• Data representation(numbers and text)
– First method• Cone Search: Get objects in this cone
11
Other Classes• Space-Time class
– http://hea-www.harvard.edu/~arots/nvometa/STCdoc.pdf
• Image Class (returns pixels)– SdssCutout– Simple Image Access Protocol
http://bill.cacr.caltech.edu/cfdocs/usvo-pubs/files/ACF8DE.pdf
– HyperAtlashttp://bill.cacr.caltech.edu/usvo-pubs/files/hyperatlas.pdf
• Spectral – Simple Spectral Access Protocol – 500K spectra available at http://voservices.net/wave
• Query Services– ADQL and SkyNode http://skyservice.pha.jhu.edu/develop/vo/adql/
• Registry: – see below
12
The Registry• UDDI seemed inappropriate
– Complex – Irrelevant questions– Relevant questions missing
• Evolved Dublin Core– Represent Datasets, Services, Portals– Needs to be machine readable– Federation (DNS model)– Push & Pull: register then harvest
• http://www.ivoa.net/twiki/bin/view/IVOA/IvoaResReg
13
SkyQueryA Prototype WWT
• Started with SDSS data and schema
• Imported about 9 other datasets into that spine schema.
• Unified them with a portal
• Implicit spatial join among the datasets.
• All built on Web Services– Pure XML– Pure SOAP– Used .NET toolkit
14
Demo
• SkyServer: – navigator showing cutout web service– List: showing many calls and variant use.
• SkyQuery:– Show integration of various archives.– Explain spatial join xMatch operator.
15
MyDB
• Portal allows federation of data but…
• Intermediate results may be large.
• Intermediate results feed into next analysis step.
• Sending them back-and-forth to client is costly and sometimes infeasible.
• Solution: create a working DB for client at Portal: MyDB
16
MyDB
• Anyone can create a personal DB at SkyServer portal. – It is about 100 MB– It is private
• Simple queries done immediately
• Complex queries done by batch scheduler
• All queries can create/read/write MyDB tables
• Very popular with “serious” users.
• MyDB will be sharable with by a group.
17
Open SkyQuery
• SkyQuery being adopted by AstroGrid as reference implementation for OGSA-DAI(Open Grid Services Architecture, Data Access and Integration).
• SkyNode basic archive objecthttp://www.ivoa.net/twiki/bin/view/IVOA/SkyNode
• SkyQuery Language (VoQL) is evolving.http://www.ivoa.net/twiki/bin/view/IVOA/IvoaVOQL
18
The WWT ComponentsOutline• Data Sources
– Literature– Archives
• Unified Definitions– Units, – Semantics/Concepts/Metrics,
Representations, – Provenance
• Object model• Classes and methods• Portals• WWT is a poster child for
the Data Grid.
What we learned• Astro is a community of 10,000 • Homogenous & Cooperative• If you can’t do it for Astro,
do not bother with 3M bio-info.• Agreement
– Takes time – Takes endless meetings
• Big problems are non-technical– Legacy is a big problem.
• Plumbing and tools are thereBut…– What is the object model– What do you want to save.– How document provenance.