sandy payette cornell information science
DESCRIPTION
The Mellon-Funded Fedora Project A Briefing for the Los Alamos National Laboratory August 26, 2002. Sandy Payette Cornell Information Science. Motivation. The Problem of Complex Content. Some familiar objects. Digital Library Content not just documents. Complex, compound, dynamic objects. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/1.jpg)
The Mellon-Funded The Mellon-Funded Fedora ProjectFedora Project
A Briefing for the A Briefing for the Los Alamos National LaboratoryLos Alamos National Laboratory
August 26, 2002August 26, 2002
Sandy Payette
Cornell Information Science
![Page 2: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/2.jpg)
MotivationMotivation
The Problem of Complex Content
![Page 3: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/3.jpg)
Digital Library ContentDigital Library Contentnot just documents ...not just documents ...
Some familiar objects
Complex, compound, dynamic objects
![Page 4: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/4.jpg)
Key Research QuestionsKey Research Questions How can clients interact with heterogeneous
collections of complex objects in a simple and interoperable manner?
How can complex objects be designed to be both generic and genre-specific at the same time?
How can we hide the complexity of an object’s underlying data structures and relationships from clients?
How can we associate services and tools with objects to provide different presentations or transformations of the object content?
How can we associate specialized, fine-grained access control policies with specific objects, or with groups of objects?
![Page 5: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/5.jpg)
The Flexible Extensible Digital Object The Flexible Extensible Digital Object Repository Architecture (FEDORA)Repository Architecture (FEDORA)
Developed as a DARPA and NSF-funded research project at Cornell (1997-present)– CORBA-based reference implementation– Extensive interoperability testing– Policy Enforcement
Interpreted and re-implemented at University of Virginia (1999)– Simple web-oriented implementation, focused on access to collections– Java servlet and relational db
Virginia prototype supported testbed of 10,000,000 digital objects with very good results (1999-2001)
Andrew W. Mellon Foundation granted Virginia and Cornell $1,000,000 to develop a full-featured production FEDORA system that that is web-based (2002+)
![Page 6: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/6.jpg)
FEDORAFEDORAOriginal Research GoalsOriginal Research Goals
• Flexibility – object model that fits many different contexts• Management - of distributed digital content and services• Access – stable interfaces to digital objects; behavior-centric• Interoperability – among digital objects and repositories • Extensibility – easy evolution of object behaviors
• Security – rights management and access control • Preservation – of content, plus “look and feel”
![Page 7: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/7.jpg)
Model for Collaboration Model for Collaboration Digital Library Research and Digital Library Research and Real Library RequirementsReal Library Requirements
University of Virginia developing extensive digital collections since 1992
Virginia Digital Library R&D Group chartered with finding solution for integration
Formal Requirements analysis Search for commercial products Discovery: Cornell research parallels stated
requirements
![Page 8: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/8.jpg)
Virginia Requirements:Virginia Requirements:Heterogeneous Digital CollectionsHeterogeneous Digital Collections
BooksRare Books
Multimedia Music
E-texts Maps Photographs Statistics
Video Art Manuscripts Data
Images3-D
ObjectsJournals
Sound Effects
![Page 9: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/9.jpg)
Virginia Requirements:Virginia Requirements:Managing the CollectionsManaging the Collections
Scalability to support hundred of millions of objects Persistent unique names for all resources without
respect to machine address Support inter-relationships among objects Manage the digital resources and metadata, as well
as computer programs, services and tools that support them
Enforce appropriate policies for use of Library resources
Provide a high level of security Support preservation activities appropriately
![Page 10: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/10.jpg)
Virginia Requirements:Virginia Requirements:Delivering the CollectionsDelivering the Collections
Well-architected, flexible relationships between services/tools and digital content
Digital objects, themselves, have ability to provide users with an appropriate launch-pad or tool to use the object content
Every resource can be used in any number of contexts Move towards a digital library that is configurable by an
“aware” user Provide resource discovery (searching) across the full
collection Deep searching in particular collections
![Page 11: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/11.jpg)
Shortcomings of commercial Shortcomings of commercial digital library productsdigital library products
Narrow focus on specific media formats (e.g. image databases, document management)
Fail to effectively address interrelationships among digital entities
Fail to address interoperability; no open interfaces to facilitate sharing of services; no standard protocols for cross-system interoperability
Fail to provide facilities for managing programs and tools that are integral to delivering digital content.
Not extensible; does not enable easy integration of new tools and services
![Page 12: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/12.jpg)
The Fedora ArchitectureThe Fedora Architecture
Overview of Basic Model
![Page 13: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/13.jpg)
FEDORA FEDORA Basic Architectural AbstractionsBasic Architectural Abstractions
Digital Object– Container for aggregating any digital content– Content disseminations based on behavior definitions– Extensibility of behavior mechanisms
• Repository– Service layer for “contained” Digital Objects– Object lifecycle management– Access management
![Page 14: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/14.jpg)
Persistent ID (PID)
Disseminators
System Metadata
Datastreams
FEDORA Digital ObjectFEDORA Digital Object
Globally unique persistent id
Public view: access methods for obtaining “disseminations” of digital object content
Internal view: metadata necessary to manage the object
Protected view: content that makes up the “basis” of the object
![Page 15: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/15.jpg)
Persistent ID (PID)
Service DefinitionMetadata
SystemMetadata
Datastreams
Behavior DefinitionObject
Behavior MechanismObject
Persistent ID (PID)
Disseminators
System Metadata
Datastreams
Data Object
FEDORA Digital Object ArchitectureFEDORA Digital Object Architecture
Persistent ID (PID)
Service BindingMetadata
SystemMetadata
Datastreams
![Page 16: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/16.jpg)
D a ta O b jec t
D i s s e m i nato r s
Watermarker
SystemMetadata
D atas tr e am s
w a te r m a r k f i l e
m e d r e s . i m a g e f i l e
h i g h r e s . i m a g e f i l e
<BMech-PID>
PID =bmech-img:12
D i s s e m i nato r s
Bootstrap
SystemMetadata
D atas tr e am s
W S D L de f in t io n s
D a ta s t r e a m B i n d S p e c
U s e r D o c u m e n t a t i o n
B eh a v io r M ech a n ismO b jec t
PID =uva-lib:1225
F e do r a R e po s i to r y
R e m o teW a te rm a rk
S e rv ice
Data Object Association to External Behavior ServiceData Object Association to External Behavior Service
![Page 17: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/17.jpg)
Digital Object InteroperabilityDigital Object Interoperability Common Behaviors for Variable ContentCommon Behaviors for Variable Content
W e b-I m a g eB e h a v io r
D e f in it io n
G e tTh u m bn a il
G e tL o wR e s o lu t io n
G e tM e dR e s o lu t io n
G e tH ig h R e s o lu t io n
D ig ita l O b jec t A
PID
D i s s e m i nato r s
Web-image
SystemMetadata
D atas tr e am s(4 im a g e f ile s )
t h u m b n a i l i m a g e f i l e
m e d r e s . i m a g e f i l e
h i g h r e s . i m a g e f i l e
m a x r e s . i m a g e f i l e
D ig ita l O b jec t B
PID
D i s s e m i nato r s
Web-image
SystemMetadata
D atas tr e am s(1 wa v e le t f ile )
M r S ID e n c o d e d f i l eFunctional equivalency
![Page 18: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/18.jpg)
Digital Object ExtensibilityDigital Object Extensibility Adding New BehaviorsAdding New Behaviors
The sameunderlyingcontent...
can be operated onin novel ways…
Book
Photo Collection
to create new disseminations
not originally conceived of
Digital Object
PID
D i s s e m i nato r s
Web-book
SystemMetadata
D atas tr e am s
TEI f i l e
p a g e 1 i m a g e f i l e
p a g e 2 i m a g e f i l e
p a g e 3 i m a g e f i l e
PID
D i s s e m i nato r s
Web-book
SystemMetadata
D atas tr e am s
TEI f i l e
p a g e 1 i m a g e f i l e
p a g e 2 i m a g e f i l e
p a g e 3 i m a g e f i l e
Photo-seek
![Page 19: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/19.jpg)
Virginia Prototype Virginia Prototype
Content Models and Fedora Demos
![Page 20: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/20.jpg)
(Mycenae image example)
General Image Content ModelGeneral Image Content Model
Metadata
Persistent ID (PID)Disseminators
Disseminator BehaviorDefinition
BehaviorMechanism
web_image1 web_image web_image1
get_thumb HTTP GET
get_med imagedisplay.java
get_high HTTP GET
get_veryhigh HTTP GET
web_default_image web_default web_default_image
get_as_page imagedisplay.java
get_in_context HTTP GET (thumb)
SystemMetadataadmin Administrativemetadata
desc Descriptivemetadata
Datastreams
basis1 pointer to thumbnail size image
basis2 pointer to medium resolution image
basis3 pointer to high resolution image
basis4 pointer to highest resolution image
![Page 21: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/21.jpg)
(Pavilion III image example)
MrSID Image Content ModelMrSID Image Content Model
Metadata
Persistent ID (PID)
DisseminatorsDisseminator Behavior
DefinitionBehavior
Mechanism
web_image_mrsid web_image web_image_mrsid
get_thumb get_image.pl
get_med get_image.pl
get_high get_image.pl
get_veryhigh get_image.pl
web_default_image web_default web_default_image
get_as_page get_image.pl
get_in_context get_image.pl
System Metadataadmin Administrativemetadata
desc Descriptive metadata
Datastreams
basis1 pointer to MrSID formatted image
![Page 22: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/22.jpg)
(Finding Aid example)
Finding Aid Content ModelFinding Aid Content Model
Persistent ID (PID)
DisseminatorsDisseminator
BehaviorDefinition
BehaviorMechanism
web_ead1 web_ead web_ead1
get_web_default eaddoc.java
get_tp tp.xsl
get_admin admin.xsl
get_summary summary.xsl
get_scopecontent scopecontent.xsl
get_bioghist bioghist.xsl
get_component component.xsl
get_arrangement arrangement.xsl
get_organization organization.xsl
get_document document.xsl
get_menu menu.xsl
web_default_ead1 web_default web_default_ead1
get_as_page eaddoc.java
get_in_context document.xsl
System Metadataadmin Administrative metadata
desc Descriptive metadata
Datastreams
basis1 pointer to XML Finding Aid source
![Page 23: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/23.jpg)
(TEI letter example)
TEI Letter Content ModelTEI Letter Content Model
Metadata
Persistent ID (PID)
DisseminatorsDisseminator
BehaviorDefinition
BehaviorMechanism
web_teiletter1 web_teiletter web_teiletter1
get_teiletter_default teiletterdoc.pl
get_original letter.header.xsl
get_modern modern.xsl
get_teiheader teiheader.xsl
get_pageimages pageimages.xsl
web_default_teiletter web_default web_default_teiletter
get_as_page teiletterdoc.pl
get_in_context letter.header.xsl
System Metadataadmin Administrativemetadata
desc Descriptive metadata
DatastreamsDatastream(s)
basis1 pointer to XML TEI letter source
![Page 24: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/24.jpg)
(TEI book example)
TEI Book Content ModelTEI Book Content Model
Metadata
Persistent ID (PID)
DisseminatorsDisseminator Behavior
DefinitionBehavior
Mechanism
web_teibook1 web_teibook web_teibook1
get_web_default teidoc.java
get_teiheader admin.xsl
get_toc contents.xsl
get_menu_teibook menu.xsl
get_tp_teibook tp.xsl
get_id id.xsl
web_default_teibook web_default web_default_teibook
get_as_page teidoc.java
get_in_context contents.xsl
System Metadataadmin Administrativemetadata
desc Descriptivemetadata
Datastreams
basis1 pointer toXML TEI book source
![Page 25: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/25.jpg)
(Mycenae example)
GDMS Content ModelGDMS Content Model
(lawn example)
Metadata
Persistent ID (PID)
DisseminatorsDisseminator Behavior
DefinitionBehavior
Mechanism
web_gdms2 web_gdms web_gdms2
get_web_default imagedef.java
get_gdmswalk gdmswalk.xsl
get_menu imagemenu.xsl
web_default_gdms web_default web_default_gdms
get_as_page imagedef.java
get_in_context HTTP GET
System Metadataadmin Administrativemetadata
desc Descriptive metadata
DatastreamsDatastream
basis1 pointer to XML GDMS source file
![Page 26: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/26.jpg)
(ICPSR survey example)
Numerical Data Content ModelNumerical Data Content Model
M etad ata
Persistent ID (PID)
DisseminatorsDisseminator
BehaviorDefinition
BehaviorMechanism
web_ic psr1 web_ic psr web_ic psr1
g e t_web _d e fau lt loader.pl
get_abstrac t abstrac t.xsl
get_c itation c itation.xsl
get_details tec hnic al.xsl
get_question variables.xsl
get_subset c odebook.pl
get_study ftpstudy .pl
web_default_ic psr1 web_default web_default_ ic ps r1
get_as_page loader.pl
get_in_c ontext abstrac t.xsl
System M etadataadmin Administrative metadata
desc Desc riptive metadata
BasisDatastream(s)
basis1 XML Codebook sourc e
basis2(TBD) pointer to SQL Database c ontaining data
![Page 27: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/27.jpg)
The New FEDORAThe New FEDORA
Technical Specifications – Part I
![Page 28: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/28.jpg)
Background MaterialBackground Material
Overview of Web Service Technologies
![Page 29: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/29.jpg)
What is a Web Service?What is a Web Service?
A distributed application that runs over the internet.
An addressable network endpoint which receives structured messages returns structured responses.
A web application that publishes an open interface through which clients can send requests and received responses.
![Page 30: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/30.jpg)
How is this different from plain How is this different from plain old web applications?old web applications?
Formally defined API (application programming interface) defines a set of abstract operations for a web service
Published bindings for client to run operations Standard protocol for invoking operations on the
service. XML as standard means of encoding service
requests and responses.
![Page 31: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/31.jpg)
Why are Web Services important?Why are Web Services important? Interoperability
– Web applications can interact and build upon each other– Data is transferred in an interoperable manner (e.g., over
HTTP)– Data is encoded in an interoperable format (XML)
Works in decentralized, distributed, operating-system independent environment.
Standards-oriented Means to expose complex operations with rich data
typing (via XML Schema language typing) Ease of integrating distributed systems via the Web W3C effort to develop this service architecture
![Page 32: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/32.jpg)
How are Web Services How are Web Services Implemented?Implemented?
The Simple Object Access Protocol (SOAP) Approach– SOAP is a messaging protocol that can run over different
transport protocols (e.g., HTTP, SMTP)– Operation oriented (send a request to a end point)– Like CORBA, RMI, DCOM…but for Web and simpler– Application APIs can be defined and published using the Web
Service Description Language (WSDL)– Requests and responses sent as XML messages– Supports simple and complex data typing in requests and
responses– Supports transmission of binary data within requests or
response packages
![Page 33: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/33.jpg)
How are Web Services How are Web Services Implemented?Implemented?
The REST (Representational State Transfer) Approach– URI + HTTP + XML– URI/resource driven; message built into a URI (URL)– HTTP GET or POST– Response is XML data
– Issues: Not a standard, but a style of doing web apps; arguably it just gives a
fancy name to how lots of people do applications on the web by default; nothing really new here; just argues to do things the way we have been, maybe a little more standard by using XML.
Fragile service definition – URL’s change No data typing on requests Limited ability to transmit complex requests on URL W3C behind SOAP, but only one strong voice out there for REST
(Prescod).
![Page 34: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/34.jpg)
Example of Web Service using SOAPExample of Web Service using SOAP
My Application
SO
AP
/HT
TPS
OA
P/H
TT
P
GoogleWeb
Service
SOAP Request (XML)
SOAP Response (XML)
doSpellingSuggestion(payet)
payette
![Page 35: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/35.jpg)
XML SOAP RequestXML SOAP Request
<?xml version="1.0" encoding="UTF-8"?>
SOAP-ENV:Envelope xmlns:SOAP-ENV=http://schemas.xmlsoap.org/soap/envelope/ xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance xmlns:xsd="http://www.w3.org/1999/XMLSchema">
<SOAP-ENV:Body>
<m:doSpellingSuggestion xmlns:m="urn:GoogleSearch">
<key>/e325JlNPASJu</key>
<phrase>payet</phrase>
</m:doSpellingSuggestion>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
![Page 36: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/36.jpg)
<?xml version="1.0" encoding="UTF-8"?><SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance" xmlns:xsd="http://www.w3.org/1999/XMLSchema">
<SOAP-ENV:Body><ns1:doSpellingSuggestionResponse xmlns:ns1="urn:GoogleSearch"
SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
<return xsi:type="xsd:string">payette</return>
</ns1:doSpellingSuggestionResponse> </SOAP-ENV:Body>
</SOAP-ENV:Envelope>
XML SOAP ResponseXML SOAP Response
![Page 37: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/37.jpg)
New Fedora: Key FeaturesNew Fedora: Key Features
Repository system exposed as two related Web services– described using WSDL– both SOAP and HTTP bindings
Digital objects encoded and stored as XML using Metadata Encoding and Transmission Standard (METS)
Digital object behaviors implemented as linkages to distributed web services (also described using WSDL)
Digital objects support versioning of both content and services.
![Page 38: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/38.jpg)
XMLdigital
objects M anagedContent
Datastre am s
ExternalContent
Datastream s
SQLdigitalobjectcache
Fedora W eb Serv ice Layer
API-MManagement
Interface
API-AAccess
Interface
Data Store Layer
W eb brow sers
Core Sub-System Im plem entations
Custom Clients
New Fedora SystemNew Fedora System
![Page 39: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/39.jpg)
Web Service Communication ViewWeb Service Communication View
HT T P
A cces s S erv ice(A P I-A )
S O A PHT T P
M an ag em en t S erv ice(A P I-M )
S O A Pht
tp
smtp
othe
r
http
smtp
othe
r
http
http
B a tch In ges t C lien t W eb Bro w serM an agem en t C lien t
S O A P
http
S O A P
http
HT T P
http
T ra ns p o rt P ro to c o l La y e r
M e s s a ge P ro to c o l La y e r
C ore S u b -S ys tem Im p lem en tation s
X M L F iles
R e la tio n a l D B
D igita l O b je c t S to ra ge
HT
TP h ttp
R e m o teB e ha v io r
M e c ha nis mS e rv ic e
HT
TP
h ttp
E xte rna lC o nte nt S o u rc e
E xte rna lC o nte nt S o u rc e
http ftp
M an aged C o n ten tExte rn a l C o n ten t
R e tr iev e r
http ftp
D a ta s tre a m S to ra ge
SO
AP
h ttp
sm tp
o th er
R e m o teB e ha v io r
M e c ha nis mS e rv ic e
SOA
P
h ttp
A ccess C lien t
S O A P
http
![Page 40: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/40.jpg)
The New FEDORAThe New FEDORA
Encoding Digital Objects in XML
![Page 41: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/41.jpg)
Metadata Encoding and Transmission Metadata Encoding and Transmission Standard (METS)Standard (METS)
XML “standard” for encoding descriptive, administrative, and structural metadata of digital library objects
Developed under auspices of the Digital Library Federation
METS standard maintained by the Network Development and MARC Standards Office of the Library of Congress
http://www.loc.gov/standards/mets/
![Page 42: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/42.jpg)
METS SchemaMETS Schema METS is written in the XML Schema Language METS defines four sections for an object
– Descriptive metadata– Administrative metadata– File group– Structure map
METS goals include:– Facilitate management of objects within a repository – Provide a standard format for exchange of objects between
repositories – Provide standard format for transmission of objects to users for
rendering (via tools or applications)
![Page 43: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/43.jpg)
Mapping Fedora to METSMapping Fedora to METSFedora METS
Persistent Identifier (PID)
<METS:mets OBJID=“uva-lib:1225”/>
Datastreams <METS:fileGrp ID=“DATASTREAMS”>
<METS:fileGrp ID=“DS1” STATUS=“A”>
<!– Version 2: High resolution image -->
<METS:file ID="DS1.1" CREATED="2002-05-20T06:32:00“ MIMETYPE="image/jgp"
<METS:Flocat LOCTYPE=“URL" xlink:href=“http://uva.edu/img8a.jpg"/>
<METS:file/>
<!– Version 1: High resolution image -->
<METS:file ID="DS1.0" CREATED="2002-05-10T02:32:00“ MIMETYPE="image/jgp"
<METS:Flocat LOCTYPE=“URL" xlink:href=“http://uva.edu/img8a.jpg "/>
<METS:file/>
</METS:fileGrp>
</METS:fileGrp>
![Page 44: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/44.jpg)
Mapping Fedora to METSMapping Fedora to METSFedora METS
System
Metadata
<METS:dmdSec/>
<METS:amdSec/>
Disseminator <METS:behaviorSec ID=“DISS1” STATUS=“A” STRUCTID=“S1”>
<METS:mechanism LOCTYPE="URN" xlink:href=“uva-bmech:12"/>
<METS:interfaceDef LOCTYPE="URN" xlink:href=“uva-bdef:8"/>
</METS:behaviorSec>
<METS:structMap TYPE=“fedora:dsBindingMap” ID=“S1”>
<METS:div TYPE=“uva-bmech:12”>
<METS:div TYPE=“IMAGE-HIGH” ORDER=“0”/>
<METS:fptr FILEID=“DS1" />
<METS:div/> <METS:div/>
</METS:structMap>
![Page 45: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/45.jpg)
Digital Object VersioningDigital Object Versioning
Versioning within Data Objects– Datastream versioning
Date/time stamped New version every time datastream is modified
– Disseminator versioning Date/time stamped New version if disseminator is modified to reference a
different Behavior Mechanism (“better mousetrap”)
Versioning within Behavior Definition and Mechanism Objects– New versions of WSDL metadata recorded in these
objects (with date/time stamps) – This deserves much more explanation that this slide can
offer!
![Page 46: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/46.jpg)
METS : Sample Fedora ObjectMETS : Sample Fedora Object
Click here for image digital object
![Page 47: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/47.jpg)
Fedora Dissemination Fedora Dissemination DatabaseDatabase
Alternate form of object storage that will act as a cache of most recent versions of digital objects
Ensure high-performance access (disseminations) Repository system replicates from authoritative
XML version of objects to relational database Plan to phase-out the database in Phase 2-3:
– Access sub-system to work completely off the XML storage, as XML tools improve performance-wise.
– Pursue different caching strategies as necessary
![Page 48: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/48.jpg)
The New FEDORAThe New FEDORA
Repository System Design
![Page 49: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/49.jpg)
Fedora Repository SystemFedora Repository System
D isse m in a tio n
H T T P
A PI -A F ed o r a -A PI-A .w s d l
S O A PH T T P
A PI -M F ed o r a -A PI-M .w s d l
S O A P
http
smtp
othe
r
http
smtp
othe
r
http
http
E x ter n a lC o n ten tS o u rc e
E x ter n a lC o n ten tS o u rc e
http
ftp
E x ter n a l C o n ten tR etr iev er
X M L F ile s
R elat io n al D B
Batc h I n g es t C lien t W eb Br o w s erM an ag em en tC lien t
C o m p o n e n tM a n a g e m e n t
O b j e c tR e f le c tio n
S O A P
http
S O A P
http
H T T P
http
S es s io n M an ag em en t S u b s y s tem- U s er A u th en t icat io n
P o l icies
O b j e c tV a lid a tio n s v c1
s v c2
T ra ns port P ro toc ol
M e s s a ge P rotoc ol
O b j e c tM a n a g e m e n t
P I DG e n e r a tio n
P o lic yM a n a g e m e n t
U s ers /G ro u p s
h t tp
ftp
L o calS erv ices
D atas tr eam S to r ag e
D ig ita l O b jec t S to r ag eS to rag e S u b s ys te m
M an ag e m e n tS u b s ys te m
S e c u rityS u b s ys te m
A c c e s sS u b s ys te m
F e do r aW e b Se r vi c eE xpo s ur eL aye r
C l i e nts
M an ag edC o n ten t
HT
TP
h t tp
SO
AP
h t tp
s m tp
o th er
R em o teBeh av io r
M ec h an is mS er v ic e
httphttp
SO
AP
R em o teBeh av io r
M ec h an is mS er v ic e
HT
TP
![Page 50: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/50.jpg)
FEDORA Web Service FEDORA Web Service API DefinitionsAPI Definitions
“API-M” – interface for management sub-system– Operations necessary to create and maintain objects and
their components– Interface directly with authoritative XML version of object
“API-A” – interface for access sub-system– Operations necessary for clients to perform disseminations
on objects in the repository– No direct access to object internal structure or components– Will work against cached representation of object to
optimize performance.
![Page 51: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/51.jpg)
Fedora Management Sub-SystemFedora Management Sub-System Implements API-MImplements API-M
Object ManagementObject Component ManagementObject ValidationPID GenerationInteracts with Storage Subsystem
![Page 52: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/52.jpg)
Other Sub-systemsOther Sub-systems Storage Sub-system
– Responsible for all matters pertaining to reading and writing objects from persistent storage
– Modular design – can configure different object readers and writers to suit the context.
– Modular design – can configure different data store strategies (in phase 1 will have file system and relational database)
Security Sub-system – Store access control policies for repository and objects– Store user and group information– Enforcement of policies
![Page 53: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/53.jpg)
Security Sub-systemSecurity Sub-systemAccess Control PoliciesAccess Control Policies
General Purpose– “Only repository managers can add new
disseminators to digital objects in the repository.”
Object-Specific (e.g., Lecture object) – “Guests may view course syllabus and slides 1-10
of Lecture 1, but may not view the lecture video or any other slides.”
– “Students may not view Lecture 2 video unless they submit assignment for Lecture 1.”
See research at: http://www.cs.cornell.edu/payette/prism/security/policy.htm
![Page 54: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/54.jpg)
Fedora Repository SystemFedora Repository System
D isse m in a tio n
H T T P
A PI -A F ed o r a -A PI-A .w s d l
S O A PH T T P
A PI -M F ed o r a -A PI-M .w s d l
S O A P
http
smtp
othe
r
http
smtp
othe
r
http
http
E x ter n a lC o n ten tS o u rc e
E x ter n a lC o n ten tS o u rc e
http
ftp
E x ter n a l C o n ten tR etr iev er
X M L F ile s
R elat io n al D B
Batc h I n g es t C lien t W eb Br o w s erM an ag em en tC lien t
C o m p o n e n tM a n a g e m e n t
O b j e c tR e f le c tio n
S O A P
http
S O A P
http
H T T P
http
S es s io n M an ag em en t S u b s y s tem- U s er A u th en t icat io n
P o l icies
O b j e c tV a lid a tio n s v c1
s v c2
T ra ns port P ro toc ol
M e s s a ge P rotoc ol
O b j e c tM a n a g e m e n t
P I DG e n e r a tio n
P o lic yM a n a g e m e n t
U s ers /G ro u p s
h t tp
ftp
L o calS erv ices
D atas tr eam S to r ag e
D ig ita l O b jec t S to r ag eS to rag e S u b s ys te m
M an ag e m e n tS u b s ys te m
S e c u rityS u b s ys te m
A c c e s sS u b s ys te m
F e do r aW e b Se r vi c eE xpo s ur eL aye r
C l i e nts
M an ag edC o n ten t
HT
TP
h t tp
SO
AP
h t tp
s m tp
o th er
R em o teBeh av io r
M ec h an is mS er v ic e
httphttp
SO
AP
R em o teBeh av io r
M ec h an is mS er v ic e
HT
TP
![Page 55: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/55.jpg)
Fedora Access Sub-SystemFedora Access Sub-System Implements API-AImplements API-A
Object Reflection– Identify the types of Behavior Definitions to which an object
subscribes (via the object’s Disseminators)– Reflect on a Behavior Definition to identify the kinds of
disseminations that can be run on the object (i.e,. as method requests)
Dissemination– Fulfills requests for particular methods (i.e., of a Behavior
Definition) to be run on an object– Mediates access to supporting services (i.e., Behavior
Mechanisms) used to present or transform datastreams of the object
– Returns a view of the object’s content to client
![Page 56: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/56.jpg)
API-A: Object Reflection RequestsAPI-A: Object Reflection RequestsIdentify Types of Behavior DefinitionsIdentify Types of Behavior Definitions
Each Disseminator is said to “subscribe” to a Behavior Definition
It does this by referencing the PID of a particular Behavior Definition Object.
Each Behavior Definition Object contains metadata that describes a set of related behaviors (or operations)
Via API-A, clients can send a service request to determine what Behavior Definitions an object subscribes to.
![Page 57: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/57.jpg)
API-A: Object Reflection RequestAPI-A: Object Reflection RequestGet Behavior MethodsGet Behavior Methods
Each Disseminator has a Behavior Definition Object associated with it.
Each Disseminator has a Behavior Mechanism Object associated with it that describes how to bind to a particular service that complies with the Disseminator’s Behavior Definition.
Via API-A, clients can send a service request to obtain the list of method definitions associated with a particular Disseminator of the digital object.
![Page 58: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/58.jpg)
API-A: Object Reflection RequestsAPI-A: Object Reflection Requests
Web-default, Web-image, Admin
get-as-page; get-in-context
MrSID Image Object
Web-default
Web-image
Admin
SystemMetadata
Basis(MrSID-encoded
image file)
Repository
AP
I-A
GetBehaviorDefinitions?PID=101PID = 101
GetBehaviorMethods?PID=101&BID=Web-default
![Page 59: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/59.jpg)
API-A: Dissemination RequestAPI-A: Dissemination Request
Clients can obtain content from a digital object with minimal knowledge about the object.
Behavior Definition identifiers and method definitions are the basis for making dissemination requests on digital objects
Client’s do not need to know particulars of how to attach to the service (Behavior Mechanism) that is operating on its behalf.
A dissemination request requires just three things:– Digital Object Identifier (PID)– Behavior Definition Identifier (BID)– Method name (and optional parameters) for a behavior
![Page 60: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/60.jpg)
API-A: Dissemination RequestAPI-A: Dissemination Request
Digital Object: 101
Image of bird
Bird Digital Library1
White Birds: Image 1 Image 2 Image 3
GetDissemination?PID=101&BID=Web-default &method=get-as-page
MrSID Image Object
Web-default
Web-image
Admin
SystemMetadata
Basis(MrSID-encoded
image file)
RepositoryA
PI-
A
![Page 61: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/61.jpg)
DisseminationsDisseminationsBenefitsBenefits
Simple access: dissemination requests shield clients from the internal structure of digital objects
Stable interface: dissemination requests are like requests against an abstract interface in that they are not tied to object implementation details that may change over time (e.g., storage locations of datastreams)
Foster Interoperability: different digital objects can vary in both the format of content and how it is structured, yet we can access them in a consistent manner via disseminations.
![Page 62: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/62.jpg)
The New FEDORAThe New FEDORA
Software Deployment
![Page 63: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/63.jpg)
Fedora Software Deployment Fedora Software Deployment GoalsGoals
An efficient, scalable, freely distributable FEDORA repository system ASAP
Make all software open source A complete basic management and access
interfaces with the initial release Add other important digital library functionality in
later releases Create multiple testbed repositories to deploy and
evaluate the software Interoperability testing, including sharing of
content and mechanisms among deployment partner repositories.
![Page 64: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/64.jpg)
Deployment GroupDeployment Group
Indiana University: Digital Library group NYU: Humanities Computing group Tufts: Digital Collections and Archives Department Kings College London: Humanities Computing Oxford: Oxford Digital Library and The Refugee Studies
Center Library of Congress: Motion Picture and Recorded Sound
Division Northwestern University: library/academic computing Los Alamos National Laboratory: Research Library
![Page 65: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/65.jpg)
Fedora Project PlanFedora Project Plan Phase 1: (pre-release Oct 31, 2002; final Jan 2003)
– Repository system with management and access subsystems exposed as web services
– Storage subsystem with XML object store and replication to relational database cache
– Object builder tools (GUI and batch)– Basic set of behavior services
Phase 2: Add more production support– Security and policy enforcement– Additional management tools– Optimize performance for accessing XML objects– Object versioning– Collection objects– Advanced disk management
Phase 3: Enhance end-user support– New kinds of disseminators, with supporting behavior services– Efficiency and scale optimization
![Page 66: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/66.jpg)
FEDORA Web Site:www.fedora.info
![Page 67: Sandy Payette Cornell Information Science](https://reader036.vdocuments.site/reader036/viewer/2022062517/56812ea4550346895d943e99/html5/thumbnails/67.jpg)
Questions and DiscussionQuestions and Discussion