large scientific databases. large scientific datasets are those which are systematically collected...

11
Large Scientific Databases

Upload: meryl-little

Post on 01-Jan-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Large Scientific Databases. Large scientific datasets are those which are systematically collected and organized and which stretch the technical capabilites

Large Scientific Databases

Page 2: Large Scientific Databases. Large scientific datasets are those which are systematically collected and organized and which stretch the technical capabilites

Large scientific datasets are those which are systematically collected and organized and which stretch the technical capabilites of the species to store, manipulate, and distribute data for scientific investigation--hence limiting that scientific investigation.

Page 3: Large Scientific Databases. Large scientific datasets are those which are systematically collected and organized and which stretch the technical capabilites

What is a “small” dataset?

“Only a few hundred gigabytes.”- Alex Szalay

Page 4: Large Scientific Databases. Large scientific datasets are those which are systematically collected and organized and which stretch the technical capabilites

What about non-scientific databases?

• Why not Google?

Page 5: Large Scientific Databases. Large scientific datasets are those which are systematically collected and organized and which stretch the technical capabilites

Fields producing these datasets

• Observational data

– Earth and space sciences

• Astronomy and Astrophysics

• Space Physics

• Atmospheric Science

• Geoscience

• Ocean Science

• Experimental Laboratory Data

– CERN

• [From Preserving Scientific Data on Our Physical Universe (Washington, National Academy Press: 1995)]

Page 6: Large Scientific Databases. Large scientific datasets are those which are systematically collected and organized and which stretch the technical capabilites

Observations

• The datasets they are collecting are huge and will grow.

• These datasets stretch the technical capabilities of what our species can do with computer applications and hardware. Thus limiting what we can learn.

• That there are bottlenecks in storage, manipulation, and in distribution.

• There is not enough bandwidth for scientific use in the sizes of datasets that now exist.

Page 7: Large Scientific Databases. Large scientific datasets are those which are systematically collected and organized and which stretch the technical capabilites

More observations

• It may be that there are solutions in other disciplines for addressing some problems scientists working with large datasets are wrestling with.– Library & Information Science– Graphics– Hardware and software vendors

• They shouldn't all have to reinvent everything separately

Page 8: Large Scientific Databases. Large scientific datasets are those which are systematically collected and organized and which stretch the technical capabilites

Is there a field?

• Connections between scientists working on large datasets appear to be informal

• Assembling scientists working with large datasets will be useful because different ones may have solved different problems already or may have useful insights to share

• There is an extensive literature but it is technical and largely not self-aware

Page 9: Large Scientific Databases. Large scientific datasets are those which are systematically collected and organized and which stretch the technical capabilites

Is there a field 2

• On a broader scale, if in 10 years these datasets can be put on a desktop computer, there will be scientists out gathering even bigger datasets. It is what humans do.

• Can principles be derived from current experience that will help deal with those future larger limits?

• Can we focus on this aspect of science?

Page 10: Large Scientific Databases. Large scientific datasets are those which are systematically collected and organized and which stretch the technical capabilites

Ancillary issues

• Policy• Characteristics of the data• etc.

Page 11: Large Scientific Databases. Large scientific datasets are those which are systematically collected and organized and which stretch the technical capabilites

What next?

• Conference?– Gather

• The scientists• Vendors• Disciplines that might help the scientists

• Literature review