world data centres — past, present and future
TRANSCRIPT
Jouml of Atmospheric and Terrestrial Physics, Vol. 56. No. 7, pp 865870, 1994 Elsevier Science Ltd
P&cd in Great Britain cm-9169194 $6.00 + 0.00
World Data Centres - past, present and future
STANLEY RUTTENBERG’ l and HENRY RISEIBETH~~~ B
1. National Center for Atmospheric Research, P.O.Box 3000, Boulder CO 80307, U.S.A.;
2. Rutherford Appleton Laboratory, Chifton, Didcot, Oxfordshire OXIl OQX, U.K.;
3. Department of Physics, University of Southampton, Southampton SO9 5NH, U.K.
Chairman* and Secretaryt, ICSU Panel on World Data Centres
(Received in finalform 26 November 1993; accepted 26 November 1993)
Abstract-This review of the ICSU World Data Centre system is offered as a tribute to Sir Granville Beynon
and his coReagues, whose vision led to the setting up of World Data Centres for the Jirtemationai Geophysical
Year of 19.57-1958. The article reviews the development of the WDC system, its place in the current scientific
scene, and some of the issues that it faces.
1. INTRODUCTION
Scientific data gathering has a Information about solar and auroral
long history. activity in past
. _ millennia was chronicled by the Chmese and other peoples. In the Western world, systematic geophysical measurements extend back for centuries, but mechanisms for data distribution and exchange are more recent. In the 18th and 19th centuries, data were exchanged from the early geomagnetic and seismic observatories largely through publications of annual station books. Oceanographic and geological expeditions were recorded in expedition reports. Aithough there were no convenient ways to copy the original records, our knowledge of the geomagnetic field, plate tectonics and ocean currents owe much to these records.
Benjamin Franklin in the 1770s and Matthew Maury a century later collected oceanographic data from the Atlantic directly from ship captains, which led to the first synoptic oceanographic study - of the Gulf Stream. From about 1860 the telegraph made it possible to use data from meteorological stations for weather forecasting, incidentally validating Franklin’s hypothesis that weather moves
from West to East. Maury advocated international collaboration in data gathering, which helped to stimulate the International Polar Years of 1882-1883 and 1932-1933, and eventu~ly led to the Intemation~ Geophysical Year of 1957-1958.
To serve the IGY, the World Rata Centre system was established through the work of Granville Beynon and his many colleagues. He maintained his connection with the WDCs in varied ways, latterly through many years of his leadership of the international Panel on World Data Centres. This brief account of the WDC system therefore fits well into this journal issue that honours Granville Beynon’s life and work.
2. TElE IGY AND ITS AFlERMATH
Planning of the IGY was coordinated by CSAGI, the Special Committee for the IGY set up by the International Council of Scientific Unions (ICSU). At the 1955 Brussels meeting of CSAGI, Granville Beynon and his colleagues reasoned that traditional records were not enough for modem research. They decided that the IGY data collections, both from
866 S. RUTI’ENBERG and H. RISHBETH
routine monitoring instruments and from special experiments, should be preserved in permanent data centres for future use. The f~ioning of the WDC system is well documented in the Annals ofthe ZGY. Explicit data management plans were developed for each IGY discipline, stating in detail the types of data, and their time schedules and formats, that were to be submitted to the WDCs for distribution and exchange, and archived for future research. These specifications were published in a series of Guides to Data Exchange which, updated over the years, remain as the standards for data exchange. The IGY planners were remarkably prescient: the 1955 recommendation mentioned that data centres should be prepared to handle data in machine-ridable form, which at that time meant punched cards and punched tape.
National IGY Committees were invited to establish and operate World Data Centres in one or more disciplines, at national expense, abiding by rules promulgate by CSAGI. A~ordingly, the USA and USSR offered to establish complex centres embracing all disciplines (respectively known as WDC-A and WDC-B). In most disciplines there was a third or even a fourth centre, known as WDC-Cl if in Western Europe and WDC-C2 if in Asia or Australia (the European centres being known simply as WDC-C if there is no corresponding WDC-C2). Multiple centres were deemed advisable to guard against catastrophic loss and for the convenience of the senders and users of the observational data.
The IGY programmes were limited and the WDC system did not cover all fields of interest. It did not include hydrology or geology (though some geological data were acquired by WDCs); exchange of glaciological data was limited to bibliographies of published papers; data exchange in meteorology was limited; and the monitoring of nuclear radiations became defunct (some monito~ng systems were revived following the Chernobyl incident). Although the WDCs for Rockets and Satellites were established, and received information on launches and orbits of spacecraft, the satellite-based scientific data were not systematically exchanged in the IGY, partly because of experimenters’ privileges. In time, however, many of these data did reach WDCs.
ICSU has set up other organizations that deal with data. The Federation of Astronomical and Geophysical Services, formalized in 1956, covers similar areas of science to those of the WDC
system, but the functions are different. FAGS centres process large amounts of data to derive indices or summ~es ch~c~~zing the dyn~ics of the Earth system. It is not their prime responsibility to archive and distribute the raw data. There is, however, some overlap with the activities of WDCs and national centres, and some FAGS centres are also WDCs. The task of CODATA, the ICSU Committ~ on Data for Science and T~hnology formed in 1966, has centred on compiling and reviewing data on physical and chemical constants and on the properties of chemical and biological substances and materials.
3. POST-ICY DEVELxOPMENW
Because of its success, the WDC system was made permanent and used for post-IGY data. New programmes evolved, based on the IGY structure as a general f~mework, such as the Intemation~ Quiet Sun Year of 1964-1965, the Intemation~ Magnetospheric Study of 1976-1979, the Solar Maximum Year of 1979-1981 and the Middle Atmosphere Programme of 1982-1985. Most of the sponsoring national bodies agreed to continue the WDCs to serve these descendent prog~mmes and, thanks to good planning, the data collections have remained accessible to users.
During the 1960s and 1970s several developments took place that had implications for the WDC system. The discipline of solar-terrestrial physics emerged, embracing many IGY subjects: the Sun, the solar wind and interplanetary magnetic field, cosmic rays, geomagnetism, ionosphere, aurora and airglow. Several data centres were reorganized to create new combined STP data centres, and some ground-based observational networks were replaced by satellites. Some IGY centres closed, generally because of the loss of a significant user immunity in the host country, and in some cases because their scientific discipline became obsolete or inactive. New centres were created, for example to serve solid earth and marine geophysics.
Some national scientific agencies developed extensive National Data Centres, few of which existed during the IGY. Many WDCs are housed within these NDCs, and the operational and financial boundaries between them may be blurred. Through their WDCs, the NDCs maintain their commitments
World data centres - past, present and future 867
to international exchange and service to any scientist who needs data. The NDC/WDC combination is a powerful one that maintains national systems to serve national needs, in which the WDC components can continue to ensure unr~~ct~ av~labili~ of data.
The Intergovernmental Oceanographic Commission (IOC) was established by UNESCO to coordinate and sponsor oceanographic programmes, mainly for operational use. IOC developed an extensive data centre system and guides for the exchange of oceanographic data, now merged with the ICSU WDC system for ~~og~phy. The World Meteorological Organization (WMO) established a meteorological data centre system, supporting the Global Atmospheric Research Programme which spanned the period 1967-1979, and the current World Climate Research Programme.
science has been established in the Netherlands and three new centres in the USA: WDC-A for Trace Gases at the Carbon Dioxide Information Analysis Centre at Oak Ridge; WDC-A for Remotely Sensed Land Data at the EROS Data Centre, Sioux Falls; and WDC-A .for P~a~limatology at the National Geophysical Data Centre in Boulder (see Table 1).
The present duties and activities of WDCs (which apply also to many national data centres) may be listed as follows:
(a) Collecting and cataloguing data and information.
(b) Compiling data sets for a wide variety of small-scale, regional and global geophysical research.
(c) Maintaining the data (whether stored on paper, film, magnetic tape or modem media) in good condition.
Since the IGY, the gathering and exchange of data has been t~sfo~~ by immen~ ~hnologi~~ advances. These advances have included the replacement of analogue with digital instruments, the networking of digital instruments to simplify the collection and exchange of their data, and automatic observatories that operate unattended, sometimes for months. Examples are provided by ionospheric, geomagnetic, seismic, and meteorological stations, and upper air soundings. Personal computers, more powerful than the su~rcomputers of the 1970s are now ubiquitous, together with compact disk readers. Many WDCs are now publishing collections of digital data sets on compact disks for cheap and easy distribution. Digital communication networks have made it possible to transfer large data and program files by electronic mail.
(d) Making data accessible through copying and dist~buting data (for WDC operations, at minimum costs of copying).
(e) Preserving important old data sets by converting them from tabular to digital form.
(f) Making data sets available on such media as compact disks, giving users the ability to search large data collections and recompile the data sets in their home laboratory.
(g) Compiling ‘data products’, e.g. by combining data from many sources to derive geophysical indices.
(h) Compiling numerical models to describe the time-varying and space-varying geophysical environment (e.g. the geomagnetic field and the upper atmosphere).
(i) Maintaining and updating on-line information services related to the above activities.
4. THE WORtD DATA CENTRE SYSTEM In pursuit of these objectives:
Today the WDC system is healthy and viable. The 44 centres are mostly maintaining their funding, though not without struggle. In 1988 a complete set of new centres was established in China as WDC-D. Some of the WDC-B in Russia and the WDC-C2 in Japan and India have been reorganized. In recognition of the increasing interest in environmental data in general, and the needs of the IGBP (International Geosphere-Biosphere Programme) in particular, a new WDC-C for soil
(i) The WDC system has initiated a ‘data rescue’ programme - finding older data sets at risk because of physical deterioration, changes in policy, or reorganization of the institutions that hold the data, and taking steps to safeguard such data.
(ii) Collaborative projects are under way between WDCs-A, -B, -C and -D to achieve another kind of ‘data rescue’, the digitization of (for example) old geomagnetic and ionospheric datasets.
(iii) In particular the WDC Panel, WDC-A and
868 S. RU’ITENBE~RG and H. RISHB~H
WDC-D have initiated a project to bring data from the large territory of China into the global change database.
(iv) Visitor programmes to bring scientists into close contact with the data holdings and the professional staff of the WDCs are being expanded.
(v) WDCs are working with data originators to improve data documentation to enable future use of the data.
(vi) Related data sets are being compiled into databases, often in common data structures, to facilitate multidisciplinary research and multivariate analyses in models, and published on diskettes and compact disks.
The above considerations imply questions of priority, and there are other specific issues and problems, such as:
* Most scientists are unwilling to give priority to data management, especially when data projects compete with what they perceive as “real” science. Nevertheless, investigators generally react positively to offers by WDCs to assist with documenting and archiving datasets for placing in the public domain.
* There is increasing difficulty - by no means confined to developing countries with their special problems - in maintaining data flow from the regular monitoring networks (e.g. geomagnetic, ionospheric, cosmic ray). This applies less to (e.g. meteorological) networks that serve short-term operational requirements, but difficulties may arise in acquiring and preserving their data for longer-term research.
* Biospheric and human-activity data needed for global change studies, such as mapping data and information that may be non-numerical and non- continuous (e.g. soil types, vegetation types, land-use) are hard to handle.
* Technical issues of ageing, error growth and ultimate lifetimes of new data storage media (such as compact disks) need to be assessed.
WDCs cost money. Data services are expensive, though some of them have a long-term effect of transferring large data sets from the WDCs to the users, thereby relieving the WDCs of some routine duties and enabling them to undertake further innovative developments. In general the WDCs and
NDCs are funded at levels insufficient to do all the data management work needed by the community, but they have nevertheless kept up with many aspects of rapidly evolving technology.
Since many WDCs are associated with a national data centre (NDC), it is difficult to estimate how much the WDC system costs to run. Based on the experience of WDC-A, it may be estimated that WDC costs are about 510% of the total costs of the national centres and their data handling activities, currently of order $SOM/year in the USA (compared to the order of lM$/year in the IGY) and perhaps $lSOM/yea.r worldwide. This implies that the national bodies spend around $ lOM/ year to maintain their WDCs and the related international obligations, whereas ICSU spends only $lSWyear on its coordinating and promotional role. Though very rough, these numbers show that the incremental costs of an efficient WDC system are small compared with the expenditure on national data centres.
5. CONTEMPORARY OPPORTUNITES AND
ISSUES
In the 199Os, the WDC system has to serve international scientific research programmes that aim to describe the complex, non-linear and interactive Earth system, with an ultimate goal of predicting its evolution and future state. The major ICSU programmes are the Solar-Terrestrial Energy Programme (STEP, 1990-1997), the International Geosphere-Biosphere Programme (IGBP, 1991- 2000), and the International Decade for Natural Disaster Reduction (IDNDR, started in 1990). Of these, STEP is in a field long served by WDCs, but the others embrace disciplines and types of data not hitherto familiar to the WDC system and present new challenges. The overall aim requires at least the following:
First, the unrestricted exchange of environmental data which is a sine qua non of any research programme to understand the Earth and its variability. The data are needed (i) to describe the boundary conditions that define the present state of the Earth’s climate and biospheric systems, (ii) to understand the workings of myriad individual physical and biospheric processes involved in the global system, and (iii) to monitor the progressive
World data centres - past, present and future a69
effects of those processes. Second, updating the historical records, beginning
with the modern instrumental era which began some 200 years ago, to provide the longer-term context within which to study the present variability of the Earth system. This necessitates the removal of data artifacts caused by changes in instruments, location, local environment and analysis techniques and, where feasible, conversion of the records to digital form. Examples are the long time series of sea-surface temperatures, starting from the 1770s; historical records (e.g. crop yields, and population and tax records) from which to infer the climate record for 2~-4~ years; 4000 years of Chinese and Korean observations of sunspots; tree-ring data from many regions; and isotopic analysis of ice cores from polar regions and high glaciers.
ZBhird, providing easy-to-use directories which tell users which data sets are available, their contents, coverage, format, and how they may be obtained. WMO’s INFOCLIMA is a good example. For global change research, new directories are being assembled by various national space and environmental agencies. Some are available online to any scientist with communication links and on computer-readable media for users without such links.
We turn to the question of the availability of data. In its Agenda 21, the 1992 United Nations Conference on Environment and Development held in Rio de Janeiro called strongly for intemation~ collaboration in data exchange. The USA tabled its official policy of unrestricted access to environmental data at minimum cost to users, which is consistent with the principles of the ICSU World Data Centre system and serves as a model for other nations to follow. Some agencies, however, espouse the idea that data have monetary value, which hinders the exchange of geophysical and environmental data and obstructs research. This ‘data market’ trend is dangerous, and may cut off data from those who try to sell their own data (in any case, the sale of data seldom covers more than a small fraction of the real cost of acquisition). Natural systems are transnational; no nation can hope to unders~d its own environment situation in ~ything less than a continental or global context.
A major challenge for the WDC system is to define the WDCs’ role in handling the huge data
streams of the major new projects, the Global Ocean Observing System and the Global Climate Observing System. The data and derived products from GOOS and GCOS are designed for operational use - weather, ocean state, etc. -but they have enormous potential for long-term research.
A word of caution is necessary. Years may pass before modem facilities are available to scientific communities everywhere, and each new development may bring a danger of enhancing the divisions between the scientific “haves” and “have-not?. The present-day WDC system sees it as an important part of its task to promote access to modem information technology for scientists in developing countries.
6. CONCLUSION
In addressing these new questions, which are also opportunities, we need the guidance which such wise colleagues as GranviIle Beynon gave during IGY and the subsequent explosion of g~physi~ research and data. An illuminating story that Granville was fond of telling comes to mind here:
Two Welshmen were playing cards. Taffy dealt a hand, and smiled satisfactorily to himself. However, when his partner Dai played his first card, Taffy stood up, scowled, threw his cards down on the tables, and roared:
‘There is nothing I think worse than to play cards with a cheat! You are not playing the cards I dealt you!’
The very wise colleagues who planned and designed the IGY gave scientists interested in data management very good cards indeed, and they were played well to build a data centre system which has grown and thrived through major changes in international research programmes. Data management is now recognized in its own right as an important branch of science and technology. Times are changing fast, and we must find the right new cards to deal to our colleagues who have to fight for the resources to provide the best data services for our future global research programmes.
870
WDC-A
WDC-A
WDC-A
WDC-A
WDC-A
WDC-A
WDC-A
WDC-A
WDC-A
WDC-A
WDC-A
WDC-A
WDC-Bl
WDC-B 1
WDC-B2
WDC-B2
WDC-B
WDC-C
WDC-C 1
WDC-C 1
WDC-C
WDC-C
WDC-c
WDC-c
WDC-Cl
WDC-C
WDC-c2
WDC-C2
WDC-C2
WDC-C2
WDC-C2
WDC-c2
WDC-c2
WDC-C2
WDC-C2
WDC-D
WDC-D
WDC-D
WDC-D
WDC-D
WDC-D
WDC-D
WDC-D
WDC-D
S. RUITENBERG and H. RISHBETH
Table 1 - WORLD DATA CENTRES 1993
Atmospheric Trace Gases
Glaciology
Marine Geology & Geophysics
Meteorology
Oceanography
Palaeoclimatology
Remotely Sensed Land Data
Rockets & Satellites
Rotation of the Earth
Seismology
Solar-Terrestrial Physics
Solid Earth Geophysics
Meteorology
Oceanography
Solar-Terrestrial Physics
Solid Earth Geophysics
Marine Geology & Geophysics
Earth Tides
Geomagnetism
Geomagnetism
Glaciology
Recent Crustal Motions
Soil Geography & Classification
Solar Activity
Solar-Terrestrial Physics
Sunspot Index
Airglow
Aurora
Cosmic Rays
Geomagnetism
Geomagnetism
Ionosphere
Nuclear Radiation
Solar Radio Emissions
Solar-Terrestrial Activity
Astronomy (Solar)
Geology
Geophysics
Glaciology & Geocryology
Meteorology
Oceanography
Renewable Resources & Environment
Seismology
Space Sciences
Oak Ridge
Boulder
Boulder
Asheville
Washington
Boulder
Sioux Falls
Greenbelt
Washington
Denver
Boulder
Boulder
Obninsk
Obninsk
Moscow
Moscow
Gelendzhik
Brussels
Copenhagen
Edinburgh
Cambridge
Prague
Wegeningen
Meudon
Chilton
Brussels
Tokyo
Tokyo
Toyokawa
Bombay
Kyoto
Tokyo
Tokyo
Nobeyama
Sagamihara
Beijing
BelJing
Beijing
Lanzhou
Beijing
Tianjin
Beijing
Beijing
Beijing
TN
co
co
NC
DC
co
SD
MD
DC
co
co
co
USA
USA
USA
USA
USA
USA
USA
USA
USA
USA
USA
USA
Russia
Russia
Russia
Russia
Russia
Belgium
Denmark
United Kingdom
United Kingdom
Czech Republic
Netherlands
France
United Kingdom
Belgium
Japan
Japan
Japan
India
Japan
Japan
Japan
Japan
Japan
China
China
China
China
China
China
China
China
China