the digitool to fda program lydia motyka florida center for library automation
TRANSCRIPT
The DigiTool to FDA Program
Lydia Motyka
Florida Center for Library Automation
What is the DigiTool to FDA Program?
A program developed by FCLA that converts exported DigiTool entities into Submission Information Packages (SIPs) for archiving in the FDA repository.
Archiving DigiTool objectsArchiving DigiTool objects is a four-step process:
– Step 1: Affiliates flag DigiTool objects for export.– Step 2: DigiTool objects flagged for export by Affiliates are
exported using the “Export Digital Entities” job.– Step 3: The DigiTool to FDA program (D2F) aggregates DigiTool
objects into Intellectual Entities and creates Submission Information Packages (SIPs) and descriptors in the format required by the FDA
– Step 4: The standard FDA Ingest process and program are used to archive the SIPs in the FDA repository.
DigiTool to Preservation Archive Workflow
ETDinformationflagged inDigitool
<title><etc>
Flag causes export of Metadata & files
ProgramCreates
SubmissionInformation
Package
SIP
SIP is Ingested inFLORIDA DIGITAL ARCHIVE
preservation repository
DefinitionsDigiTool Digital Entity:
Digital entities contain the following components:– A persistent DigiTool internal ID (PID)– Metadata of various types that describe the object– A stream_ref section that points to an object
DefinitionsSubmission Information Package (SIP):
An FDA Submission Information Package (SIP) is a set of files intended for ingest into the Florida Digital Archive. (It is recommended practice that a single SIP should include only those files that comprise a single Intellectual Entity.)
DefinitionsIntellectual entity:“An Intellectual Entity is defined as something that
can be reasonably described and used as a unit, and corresponds roughly to what might be described by a bibliographic record: a book, a sound recording, a photograph. (In the case of serial publications, it is recommended that a SIP include only a single issue, not a volume or set of volumes.)”
FCLA Digital Archive (FDA) SIP Specification, Version 1.0
Selecting DigiTool entities for export to the FDA
• Only those objects with filestreams in formats suitable for long-term preservation should be selected for archiving. (Format information can be found on the FDA website.) Examples:– ETDs containing PDFs– Institutional Repository materials– Masters of scanned images when TIFF files have been loaded
into DigiTool
• Complex objects can be exported to the FDA but care must be taken in flagging them
Flagging DigiTool entities for export• DigiTool entities must have the following Control Fields
in order to be exported for archiving in the FDA:– Pres. Level = “Preservation Master”– Partition C must contain a valid FDA Account and Project code,
separated by a comma
Pres.Level
Partition C
• Note that each DigiTool object desired for archiving must be flagged with
Pres. Level = “Preservation Master”• Related objects not flagged as “Preservation
Master” will not be exported for archiving.• Objects without proper Partition C content will
not be archived.• Note that Usage Type = “Archive” is irrelevant to
the DigiTool to FDA process.
Flagging DigiTool objects
Example – manifestations
3 manifestations
View Main(primary manifestation)
Do NOT flag THUMBNAIL or INDEX for archiving
The Export Process• FCLA will run the DigiTool “Export Digital
Entities” job nightly to extract all flagged DigiTool entities and their filestreams and metadata.
• Only those objects flagged with Pres. Level = “Preservation Master” will be exported. Related objects (manifestations, parent/children) not flagged as Preservation Masters will not be exported.
• The objects output by this program are copied to a special workspace where the DigiTool to FDA (D2F) program uses them as input.
The DigiTool to FDA conversion process
• Step 1: exported objects (metadata and filestreams) are aggregated into packages, one for each Intellectual Entity
• Step 2: metadata is extracted from the exported objects and a SIP descriptor file is created for the package
• Step 3: filestreams are listed as content files in the SIP descriptor
Aggregation into Intellectual Entities
• An Intellectual Entity (e.g. book) in DigiTool can consist of a number of digital entities linked by “Manifestation”, “Includes” and “Part of” relationship links
• The “Export Digital Objects” job exports each flagged digital object separately
• After export, DigiTool to FDA uses relationship links to aggregate the exported objects into SIPs that include all of the filestreams that constitute the Intellectual Entity
Rules to Remember
• If you wish to archive multiple manifestations, make sure that one of the manifestations is flagged Usage Type = “View Main”
• If you have a complex object (a parent and child objects) make sure to flag the parent for export
Example of Aggregation/Flagging in DigiTool:Single Master (ETD)
PID 111 (manifestation)Dublin Core descriptive metadataFilestream: PDFPres. Level = Pres. MasterPartition C = Account,Project
PID 222 (manifestation)Filestream: thumbnailPres. Level = blankPartition C = blank
Example of Aggregation – ExportSingle Master (ETD)
“Export Digital Entities” Query: Select Pres. Level = Pres. Master and Date=today
PID 111
PID 222
DigiTool Export Workspace
PID 111
Example of Aggregation – D2FSingle Master
PID 111
Export Workspace
SIP 111:•Descriptor (descriptive metadata)•PDF content file
D2F Workspace
Example of Aggregation/Flagging in DigiTool:Manifestations
PID 111 (manifestation)Dublin Core descriptive metadataUsage Type=View (primary)Filestream: TIFFPres. Level = Pres. MasterPartition C = Account,Project
PID 222 ( manifestation)Filestream: TIFFPres. Level = Pres. MasterPartition C = Account,Project
PID 333 (manifestation)Filestream: thumbnailPres. Level = blankPartition C = blank
Example of Aggregation – ExportManifestations
“Export Digital Entities” Query: Select Pres. Level = Pres. Master and Date=today
PID 111
PID 222
PID 333
DigiTool Export Workspace
PID 111
PID 222
Example of Aggregation – D2F:Manifestations
PID 111(View Primary)
PID 222
Export Workspace
SIP 111:•Descriptor (descriptive metadata)•TIFF content file•TIFF content file
D2F Workspace
The D2F program creates one SIP from thetwo exported objects, based on “Manifestation” links
Example of Aggregation/Flagging in DigiTool:Complex Object
PID 111 (Parent and manifestation)Dublin Core descriptive metadataNo filestreamPres. Level = Pres. MasterPartition C = Account,Project
PID 222 (child and manifestation)Filestream: TIFFPres. Level = Pres. MasterPartition C = Account,Project
PID 333 (manifestation)Filestream: thumbnailPres. Level = blankPartition C = blank
PID 444 (child and manifestation)Filestream: JP2Pres. Level = Pres. MasterPartition C = Account,Project
PID 555 (child and manifestation)Filestream: _*index.htmlPres. Level = blankPartition C = blank
Example of Aggregation – Export:Complex Object
“Export Digital Entities” Query: Select Pres. Level = Pres. Master and Date=today
PID 111
PID 222
PID 333
PID 444
PID 555
DigiTool Export Workspace
PID 111
PID 222
PID 444
Example of Aggregation – D2F:Complex Object
PID 111(parent)
PID 222
PID 444
Export Workspace
SIP 111:•Descriptor (descriptive metadata)•TIFF content file•JP2 content file
D2F Workspace
The D2F program creates one SIP from thethree exported objects, based on “Part of”, “Includes” links
Creation of metadata in SIP descriptor• Descriptive metadata is copied from the parent entity or
main manifestation into the SIP descriptor (dmdSec)• A checksum is generated for every file in the SIP and
stored in the SIP descriptor.• Other technical metadata is not copied from DigiTool into
the SIP descriptor because the FDA generates its own. • Administrative metadata (change history) is not copied
into the SIP descriptor at this time. It may be added as Phase 2.
• Access restrictions are not copied into the SIP descriptor because the information is local to DigiTool.
Descriptive metadata in DigiToolDigiTool supports the following descriptive
metadata formats:– MARC21– MODS– Dublin Core
The FDA currently loads title information into its database only from MODS and Dublin Core metadata, although all MARC21 metadata is archived in the descriptor file. (MARC21 title information will be included in DAITSS 2)
Step 3: Archiving converted SIPs• SIPs created by D2F are sent to the FDA
Ingest queue and processed by the standard FDA programs like all other SIPs
• A successful ingest of a D2F-created SIP will result in an Ingest report being sent to the usual Affiliate reports address.
• Any D2F-created SIPs rejected by the FDA will result in Error reports being sent to the usual Affiliate reports address
Why would the FDA reject D2F SIPs?
Even though D2F creates SIPs according to FDA specifications, the SIPs can be rejected for the following reasons:– The FDA Account and Project codes in
Partition C are invalid or are not comma-separated
– The SIP contains no content files (DigiTool filestreams).
Problems that won’t be reported:
FDA ingest program does not recognize the following conditions as errors:
• If you flag a parent for export to the FDA but do not flag all of the appropriate children, critical portions of the Intellectual Entity won’t be archived.
• If you flag children but do not flag the parent, each child will create a separate SIP.
• If you don’t flag all manifestations appropriate for archiving, critical portions of the Intellectual Entity won’t be archived.
What to do after D2F SIPs are archived
FCLA recommends that you record the FDA IEID (Intellectualy Entity ID) in the Note Control Field of the DigiTool entity.
FDA IEID (from Ingest Report)
• Beta testing, DigiTool workflow by DigiTool workflow
• Volunteers needed for beta testing
End
Next Steps?