loc 13 june 2003 1 nssdc role and oais implementation brief overview don sawyer
TRANSCRIPT
LOC 13 June 20032
NSSDC RolesNSSDC Roles
NSSDC is the NASA Office of Space Science (OSS) permanent archive
— Astronomy, Solar & Space Plasma Physics, Planetary & Lunar data— Digital and film data spanning 1958-2002 from >1300 instruments flown
on >375 spacecraft— Distinguished from OSS Active Archives (AA)
Interacts in a timely manner with all distributed OSS active archives in space physics, solar physics, astrophysics, and planetary science disciplines to acquire the OSS data and supporting metadata needed for long term preservation and understanding; — interact directly with projects when mediated by an
active archive; — interact with PI's and related individuals when they
have data needing long-term preservation.
LOC 13 June 20033
OSS Archive RelationshipsOSS Archive Relationships
Planetary AAs Solar AAs SEC AAsAstrophysics AAs
Various OSS S/C Projects
NSSDC Permanent Archive
DLTs, Tapes, CD/DVDs,
Film, Paper
AnonymousFTP
OSS Researchers, Non-OSS ResearchersEducation Community, General Public
PDS and SEC data on media
LOC 13 June 20034
NSSDC Roles (concl’d)NSSDC Roles (concl’d)
NASA's lead for Consultative Committee for Space Data Systems (CCSDS) Archiving and Data Packaging/Registry Working Groups (on-ground data management)
— Led development of CCSDS/ISO Open Archival Information System reference model standard
Comprehensive information base about all launched spacecraft (~6000)
Host of World Data System for Satellite Information— Part of worldwide World Data Center infrastructure established
~1958
LOC 13 June 20035
NSSDC’s Permanent Archive NSSDC’s Permanent Archive Environment - Legacy ViewEnvironment - Legacy View
~20 TB in ~2,300 digital data sets on ~40,000 offline media
— Most on tape— Most newly arriving media are CD's or DVD's
"Data set" is all data from a given source (e.g., instrument on a spacecraft) at a given "processing level."
Wide range of data characteristics (e.g., documented binaries specific to now-obsolete computers)
Also, ~2,000 data sets on large number of film media of various form factors.
— Gradually being digitized into TIFF via scanning.
LOC 13 June 20036
Initial Drivers for OAIS Re-engineeringInitial Drivers for OAIS Re-engineering
Needed to solve a migration problem— Remove dependencies of VAX VMS files on the
operating system— Include record defining attributes in a standard form to
accompany the data file content— Result was package of data/metadata
Had software, based on CCSDS/ISO packaging standard, that could be augmented
OAIS reference model provided an architectural view
LOC 13 June 20037
Created Archival Information PackageCreated Archival Information Package
Single File (binary/ascii content) Uses CCSDS/ISO packaging (SFDU) to hold
multiple data objects— NSSDC defined attribute object expressed in
CCSDS/ISO Parameter Value Language (PVL)— NSSDC data file content in one of four canonical forms
• Two flavors each of binary and ascii— 20-byte SFDU ascii labels to separate data objects
LOC 13 June 20038
NSSDC Attribute ObjectNSSDC Attribute Object NSSDC Attribute Object
— Object identification and version— Archival Storage Id ( unique)— Collection Id— Checksum over rest of attribute object— Attributes for original data stream
• Date/time created, operating system, size in bytes, record format, binary/ascii flag, file name, checksum, etc.
— Attributes for canonical form of data stream• Date/time created, operating system, size in bytes, record
format, binary/ascii flag, file name, checksum, processing report, format identifier (ADID), etc.
— Order applied encodings (e.g., tar,gzip)— Start date/time of data observations
LOC 13 June 20039
NSSDC Permanent Archive - New NSSDC Permanent Archive - New DirectionDirection
Bundle data files (objects) with data_file-descriptive attribute file (object) and pointers to further documentation into OAIS "Archive Information Package (AIP)"
— Write to Digital Linear Tape (DLT)-based jukebox in unix environment— Write data files and attribute files to RAID disk for ftp-based access by
external customer
AIP Structure
Attribute Object(AO)
Label Label Label Sensor Data Object(SDO)
CCSDS/ISO Labelfor Packaging
CCSDS/ISO Label forAttribute Object
CCSDS/ISO Label forSensor Data Object
Globally UniqueRegistry Identifiers
Globally UniqueRegistry Identifier
Expressed usingCCSDS/ISO language
LOC 13 June 200311
Migrating Data into AIPsMigrating Data into AIPs
Have created AIPs for data previously on NSSDC's newly retired 12" WORM data dissemination jukebox
— VMS-based, so some attributes placed in attribute objects compensate for loss of VMS/Files-11 support
— Modified data files in cases of variable-length records, and introduced "CR/LF" for appropriate ASCII data
Now creating multi-data-file AIP and upgrading software to accommodate data migrating from legacy offline tapes
— Will start ingest from tape imminently
LOC 13 June 200312
Facilitating Archiving via Data Supplier Facilitating Archiving via Data Supplier SupportSupport
NSSDC has provided software to the IMAGE spacecraft project— Generates attribute objects and bundles these with data files into Archive
Information Packages (AIP— IMAGE script transmits these to NSSDC
Looking for other opportunities to support NASA spacecraft projects equivalently
— Cost-effective data ingest
Data files
Configurationinformation
NSSDCPackageGenerator
AIPs NationalSpaceScienceDataCenter
ftp
IMAGEScript
IMAGE Science Operations Centre
LOC 13 June 200313
NSSDC Architecture SummaryNSSDC Architecture Summary
For the system architecture:— compliant with the OAIS functional model
separates different functions : ingest, archival storage, data management, access
— Compliant with the OAIS information model defines an Archival Information Package (AIP) for preservation in Archival Storage
Data are being migrated into Archival Information Packages for long-term storage on DLTs
New data received arrive as AIPs (e.g., the IMAGE project) or are put into AIPs during the Ingest process
LOC 13 June 200314
Current ActivitiesCurrent Activities
Developing a better integration of our metadata databases— Many have grown up over the years— Taking advantage of Java and web capabilities
Developing an Archival Information Package type that allows multiple ‘canonical data files’ in a single package file.
— Needed for the migration of legacy data on magnetic tape
— Needed to put small files together for ease of management
Planning a better overall integration of our architecture— E.g., tighter coupling between AIPs and other
information bases
LOC 13 June 200318
Archive ChallengesArchive Challenges
Making most cost-benefit favorable judgements on modernization of low-access-potential older data sets.
— Convert vendor-specific binaries to IEEE-binary? Via EAST? Convert to ASCII?
Implement efficient production process for migrating data from ~10,000 tapes through AIP-creation software to nearline DLT-based permanent archive
Define post-DLT permanent archive environment
Ensuring existence of all material needed to make data correctly and independently usable
— Couple such material to the data being supported
LOC 13 June 200319
NSSDC Metadata EnvironmentNSSDC Metadata Environment
Information base (JEDS) about— All launched spacecraft, — Instruments on space science spacecraft,— NSSDC-held data sets therefrom. — Underlies "NSSDC Master Catalog" interface.
Information base (DIOnAS) about data files — Written to new nearline permanent archive — Written to anonymous nssdcftp/spacecraft_data/
Attribute objects with technical information about data files
Information base (JIN) about data media
LOC 13 June 200320
NSSDC Metadata Environment (concl’d)NSSDC Metadata Environment (concl’d)
Information base (CAOIS) of CCSDS-registered data set-descriptive information (e.g., formats)
— Assigns globally-unique registry identifiers— Relevant to growing fraction of NSSDC data plus other data
Array of "data set catalogs" with detailed information on NSSDC-held legacy data sets
— Presently on CD's as TIFF and PDF images
Other special purpose information bases and metadata collections
NSSDC data set ID's are primary mechanism currently linking these "metadata modules"
LOC 13 June 200321
NSSDC’s Metadata ChallengesNSSDC’s Metadata Challenges
To ensure flow to NSSDC of material needed for the correct and independent use of data along with the flow of data to NSSDC
To optimally integrate metadata modules to support:— Users' finding, retrieval and use of data,— NSSDC staffers' archive management activities
To ensure that all relevant supporting material is visible to and readily retrievable by NSSDC's data-accessing customers.
LOC 13 June 200322
SoftwareSoftware
NSSDC has growing amount of low-processing-level (lpl) data— Started archiving such data only in past decade
NSSDC has very little data set-specific READ/PROCESS software
— This greatly limits usability of lpl data
Lpl data handled by systems/formats like SDDAS/IDFS and IMAGE_Archive/UDF
Major need for software standards/approaches to accompany lpl data into archives
— Ensure long-term usability of such data Archiving of relevant software source code a minimal
requirement