1 symposium: open access to information panel 2: open access & institutional repositories 24...
TRANSCRIPT
1
Symposium: Open Access to Information
Panel 2: Open Access & Institutional Repositories24 August 2006, Brasilia
Digital Libraries, Electronic Theses and Dissertations (ETDs), and NDLTD
http://fox.cs.vt.edu/talks/2006/20060824IBICTp2
Edward A. Fox, [email protected] Director, NDLTD
Chair, IEEE-CS Tech. Committee on Digital LibrariesProfessor, Department of Computer ScienceDirector, Digital Library Research Laboratory
Virginia Tech, Blacksburg, VA 26061 USA
2
Outline
• Key Ideas• Acknowledgements• Digital Libraries• DLs & Scholarly Communication• Institutional Repositories• NDLTD• Summary• DL Futures
3
Key Ideas - Overview
• Theorem 1: Supporters of Open Access should support NDLTD.
• Theorem 2: 5S can guide us to better support of Open Access.
4
Acknowledgements
• Students
• Faculty, Staff
• Collaborators
• Support
• Mentors
5
Acknowledgements: Students
• Pavel Calado, Yuxin Chen, Fernando Das Neves, Shahrooz Feizabadi, Robert France, Marcos Gonçalves, Nithiwat Kampanya, S.H. Kim, Aaron Krowne, Bing Liu, Ming Luo, Paul Mather, Fernando Das Neves, Unni. Ravindranathan, Ryan Richardson, Rao Shen, Ohm Sornil, Hussein Suleman, Ricardo Torres, Wensi Xi, Baoping Zhang, Qinwei Zhu, …
6
Acknowledgements: Faculty, Staff
• Lillian Cassel, Debra Dudley, Roger Ehrich, Joanne Eustis, Weiguo Fan, James Flanagan, C. Lee Giles, Eberhard Hilf, John Impagliazzo, Filip Jagodzinski, Rohit Kelapure, Neill Kipp, Douglas Knight, Deborah Knox, Aaron Krowne, Alberto Laender, Gail McMillan, Claudia Medeiros, Manuel Perez, Naren Ramakrishnan, Layne Watson, …
7
Other Collaborators (Selected)
• Brazil: FUA, IBICT, UFMG, UNICAMP, USP• Case Western Reserve University• Emory, Notre Dame, Oregon State• Germany: Humboldt U., U. Oldenburg• Mexico: UDLA (Puebla), Monterrey• College of NJ, Hofstra, Penn State, Villanova• University of Arizona• University of Florida, Univ. of Illinois• University of Virginia• VTLS (slides on digital repositories, NDLTD)
Acknowledgements: Support
• Course: UNESCO, CETREDE, IFLA-LAC, AUGM, CLEI, UFC
• Sponsors: ACM, Adobe, AOL, CAPES, CNI, CONACyT, DFG, IBM, Microsoft, NASA, NDLTD, NLM, NSF (IIS-9986089, 0086227, 0080748, 0325579, 0535057; ITR-0325579; DUE-0121679, 0136690, 0121741, 0333601), OCLC, SOLINET, SUN, SURA, UNESCO, US Dept. Ed. (FIPSE), VTLS
9
Acknowledgements - Mentors
• JCR Licklider – undergrad advisor (1969-71)– Author in 1965 of “Libraries of the Future”– Before, at ARPA, funded start of Internet
• Michael Kessler – BS thesis advisor– Project TIP (technical information project)– Defined bibliographic coupling
• Gerard Salton – graduate advisor (1978-83)– “Father of Information Retrieval”
10
Digital Libraries
• Definitions
• DL Manifesto – Reference Model
• Book in process (Fox & Gonçalves), 5S
• DL Curriculum Project
11
DL Definitions - 1
• “A digital library is an organized and focused collection of digital objects, including text, images, video, and audio, along with methods of access and retrieval, and for selection, creation, organization, maintenance, and sharing of the collection.”
• Witten & Bainbridge – “How to Build a Digital Library” – Morgan Kaufmann 2003
12
DL Definitions - 2
• “Digital libraries are organizations that provide the resources, including the specialized staff, to select, structure, offer intellectual access to, interpret, distribute, preserve the integrity of, and ensure the persistence over time of collections of digital works so that they are readily and economically available for use by a defined community or set of communities”
• Waters,D.J. CLIR Issues, July/August 1998• www.clir.org/pubs/issues/issues04.html
13
DL Definitions - 3
• Issues and Spectra
– Collection vs. Institution
– Content vs. System
– Access vs. Preservation
– “Free” vs. Quality
– Managed vs. Comprehensive
– Centralized vs. Distributed
14
DL Definitions - 4
• NOT a “digitized library”• NOT a “deconstruction” of existing
systems and institutions, moving them to an electronic box in a Library
• IS a new way to deal with knowledge– Authoring, Self-archiving, Collecting,– Organizing, Preserving,– Accessing, Propagating, Re-using
15
D ig ita l L ib ra r y C o n te n t
A rtic le s ,R e p o rts,
B o o ks
T e xtD o cum e n ts
S p ee ch ,M u s ic
V id eoA u d io
(A e ria l)P h o tos
G e og rap h icIn fo rm ation
M o d e lsS im u la tio ns
S o ftw a re ,P ro g ra m s
G e no m eH u m a n,a n im a l,
p la n t
B ioIn fo rm ation
2 D , 3 D ,V R ,C A T
Im ag es a ndG ra p h ics
C o nte n tT yp e s
16
DL Manifesto - 1
• DL Reference Model• In support of the future European Digital Library• Developed by team connected with DELOS
(Candela, Casteli, Ioannidis, Koutrica, Meghini, Pagano, Ross, Schek, Schuldt)
• Draft 2.2 presented in Frescati, near Rome, June 2006 – 79 pages
• Could be integrated with work of DLF, JISC, etc.
17
DL Manifesto – 2: 3 Tiers
18
DL Manifesto – 3: Main Concepts
19
DL Manifesto – 4: Actor Roles
20
Fox & Gonçalves DL Book Parts
• Ch. 1. Introduction (Motivation, Synopsis)
• Part 1 – The “Ss”
• Part 2 – Higher DL Constructs
• Part 3 – Advanced Topics
• Appendix
21
Book Parts and Chapters - 1
• Ch. 1. Introduction (Motivation, Synopsis)
• Part 1 – The “Ss”– Ch. 2: Streams
– Ch. 3: Structures
– Ch. 4: Spaces
– Ch. 5: Scenarios
– Ch. 6: Societies
22
Informal 5S & DL Definitions
DLs are complex systems that
• help satisfy info needs of users (societies)
• provide info services (scenarios)
• organize info in usable ways (structures)
• present info in usable ways (spaces)
• communicate info with users (streams)
23
Digital Object
RepositoryCollection Minimal DL
Metadata Catalog
Descriptive Metadata
Specification
A Minimal DL in the 5S Framework
Structural Metadata
Specification
Streams Structures Spaces Scenarios Societies
indexing
browsing searching
services
hypertext
Structured Stream
24
Book Parts and Chapters - 2
• Part 2 – Higher DL Constructs– Ch. 7: Collections
– Ch. 8: Catalogs
– Ch. 9: Repositories and Archives
– Ch. 10: Services
– Ch. 11: Systems
– Ch. 12: Case Studies
25
Book Parts and Chapters - 3
• Part 3 – Advanced Topics– Ch. 13: Quality– Ch. 14: Integration– Ch. 15: How to build a digital library– Ch. 16: Research Challenges, Future Perspectives
• Appendix– A: Mathematical preliminaries– B: Formal Definitions: Ss – C: Formal Definitions: DL terms, Minimal DL– D: Formal Definitions: Archeological DL– E: Glossary of terms, mappings
26
DL Curriculum FrameworkSemester 1:
DL collections:development/creation
Semester 2:DL services and
sustainability
CO
UR
SE
ST
RU
CT
UR
E
DigitizationStorage
Interchange
Digital objectsCompositesPackages
MetadataCataloging
Author submission
NamingRepositories
Archives
Spaces(conceptual,geographic,2/3D, VR)
Architectures(agents, buses,
wrappers/mediators)Interoperability
Services(searching,
linking, browsing, etc.)
Intellectual property rights mgmt.
PrivacyProtection (watermarking)
Archiving and preservation
Integrity
Architectures(agents, buses,
wrappers/mediators)Interoperability
CO
RE
DL
TO
PIC
S
DocumentsE-publishing
Markup
Info. NeedsRelevanceEvaluation
Effectiveness
ThesauriOntologies
ClassificationCategorization
Bibliographic information
BibliometricsCitations
RoutingFiltering
Community filtering
Search & search strategyInfo seeking behavior
User modelingFeedback
Info summarizationVisualization
Multimedia streams/structures
Capture/representationCompression/coding
Content-based analysis
Multimedia indexing
Multimediapresentation,
rendering
RE
LA
TE
DT
OP
ICS
27
Project Teams/NSF Grant
• Project Team at VT (IIS-0535057): – PI: Dr. Edward A. Fox ([email protected]) – GRA: Seungwon Yang ([email protected])
• Project Team at UNC-CH (IIS-0535060): – Co-PI: Dr. Barbara Wildemuth
([email protected]) – Co-PI: Dr. Jeffrey Pomerantz
([email protected]) – GRA: Sanghee Oh ([email protected])
28
DLs & Scholarly Communication
• Asynch
• Information Life Cycle
• Flattening
• Author skills, toward Semantic Web
• Crossing the Chasm
• OAI
29
Asynchronous, Digital Library Mediated Scholarly Communication
Different time and/or place
30
Information Life Cycle
AuthoringModifying
OrganizingIndexing
StoringRetrieving
DistributingNetworking
Retention/ Mining
AccessingFiltering
UsingCreating
31
Digital LibrariesShorten the Chain from
Editor
Publisher
A&I
Consolidator
Library
Reviewer
32
DLs Shorten the Chain to
Author
Reader
Digital
LibraryEditor
Reviewer
Teacher
Learner
Librarian
33
Important skills for authors
• Authoring (Word Processing ->e-pub)
• Rendering, presenting
• Tagging, Markup (XML, SGML)
• “Semi-structured information”
• Dual-publishing, eBooks
• Styles (XSL, XSLT)
• Structured queries
34
35
36
37
38
OAI – Repository PerspectiveRequired: Protocol
DODO DO DO
MDO
MDO MDOMDOMDO
MDOMDOMDO
39
OAI – Black Box Perspective
OA 1
OA 2
OA 4
OA 3
OA 5OA 6
OA 7
40
DiscoveryCurrent
AwarenessPreservation
Service Providers
Data Providers
Meta
data
harv
estin
g
The World According to OAI
41
Institutional Repositories
• Definitions, Goals
• Eprints
• DSpace
• Fedora, VITAL
• Comparisons
• ODL + 5S Suite (not shown)
42
Institutional Repositories - 1
• “Institutional repositories are digital collections that capture and preserve the intellectual output of a single university or a multiple institution community of colleges and universities.”
• Crow, R. “Institutional repository checklist and resource guide”, SPARC, Washington, D.C., USA
• www.arl.org/sparc/IR/IR_Guide_v1.pdf
43
Institutional Repositories - 2
• “A university-based institutional repository is a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members. It is most essentially an organizational commitment to the stewardship of these digital materials, including long-term preservation where appropriate, as well as organization and access or distribution.”
• Lynch, C.A. In ARL Bimonthly Report 226, pp. 1-7, Feb. 2003, www.arl.org/newsltr/226/ir.html
44
What is aDigital Object Repository?
Also called: digital rep., digital asset rep., institutional repository
Stores and maintains digital objects (assets)Provides external interface for Digital Objects
Creation, Modification, Access
Enforces access policiesProvides for content type disseminations
Adapted from Slide by V. Chachra, VTLS
45
Goals of Institutional Repositories (by Steven Harnad, U. Southampton) Self Archiving of Institutional ResearchSelf Archiving of Institutional Research
Thesis and Dissertations (VTLS NDLTD Project)Thesis and Dissertations (VTLS NDLTD Project)Article preprints and post printsArticle preprints and post printsInternal documents and mapsInternal documents and maps
Management of digital collectionsManagement of digital collections
Preservation of materials – decentralized approachPreservation of materials – decentralized approach
Housing of teaching materialsHousing of teaching materials
Electronic Publishing of journals, books, posters, maps, Electronic Publishing of journals, books, posters, maps, audio, video and other multimedia objectsaudio, video and other multimedia objects
Adapted from Slide by V. Chachra, VTLS
46
47
48
49
50
51
52
53
54
What is Fedora™?
• Slides courtesy Vinod Chachra of VTLS
Flexible Extensible Digital Object Repository Architecture
55
History of Fedora™• 1997-Present
– DARPA and NSF-funded research project at Cornell (Conceptual framework developed by Sandra Payette and Carl Lagoze)
– Reference implementation developed at Cornell
• 1999-2001– University of Virginia digital library prototype (Thornton
Staples and Ross Wayland)
• 2002-Present– Andrew W. Mellon Foundation granted Virginia and Cornell
$1 million to develop a production-quality Fedora system– Fedora 1.0 released in May 2003 as Open Source under the
Mozilla public license.
56
Fedora™ Terms
MetadataDigital Objects (data)Complex Objects (Object consisting of many
objects in a complex/hierarchical relationship)Content (Data and Metadata together)Data-streams (are content for dissemination) Disseminators (are services) – A dissemination
is defined as a stream of data that manifests a view of the digital objects content.
57
Digital Object w. multiple datastreams
Digital ObjectDigital Object
DCDC
EADEAD
DatastreamsDatastreamsDatastreamsDatastreams
Admin
Metadata
Admin
Metadata
EAD
EAD
58
Example DisseminatorsPersistent ID (PID)
Default
Disseminators
Simple Image
System Metadata
Datastreams
Get ProfileList ItemsGet Item
List MethodsGet DC Record
Get ThumbnailGet Medium
Get HighGet VeryHigh
59
Fedora™Repository
E x ter n a lC o n ten tS o u r c e
E x ter n a lC o n ten tS o u r c e
HT
TP
E x ter n a l C o n ten tR etr iev er
X M L F ile s
Re la t io n a l D B
S e s s io n M a n a g e me n tU s e r A u th e n t ic a t io n
P o l icies
U s ers /G ro u p s
H T T P
F T P
D atas tr eam s
D ig ita l O b jec tsS to rag e S u b s ys te m
S e c u rityS u b s ys te m
W e b Se r vi c eE xpo s ur eL aye r
SO
AP
R em o teS er v ic e
L o c alS er v ic e
M an ag e A c c e s s S e arc h O A I P ro v id e r
M an ag e m e n tS u b s ys te m
A c c e s sS u b s ys te m
HT
TP
FT
P
H T T PH T T P S O A P H T T P S O A P H T T P S O A P
C lie n tA pplica t io n
B a tchPro g ra m
S e rv e rA pplica t io n
W e bB ro ws e r
Co mp o n e n t M g mt
O b je c t M g mt
O b je c t Va lid a t io n
P ID Ge n e ra t io n
O b je c t D is s e min a t io n
O b je c t Re fle c t io n
P o lic y En fo rc e me n t
P o lic y M g mt
Co n te n t
Web Service Web Service Exposure Exposure LayerLayer
Adapted from Slide by V. Chachra, VTLS
60
Fedora Advantage
• Extensible digital object model• Repository exposed by Web services APIs
– Management (Creation, Deletion, Maintenance, Validation)
– Access (Search, Disseminations)
• Scalable, persistent storage for content and metadata
• Content can be local and/or remote• Content versioning• Open source solution
61
Comparison of DSpace and Fedora
Dspace is a standalone product in a box whereas Fedora can be standalone or integrated with ILS
In Fedora the metadata and the content are treated the same way as data-streams; in Dspace the metadata and content get separate treatments.
Fedora can define complex objects easier Dspace is not as extensible as Fedora as it deals both with
the repositories and workflows. Fedora focuses only on the data model.
Fedora uses the Mozilla licensing model and Dspace uses GNU license. It makes it easier for software companies to provide extensions to the model.
62
VITAL / Fedora Relationship
63
Prospero: Summary of features of the three software packages compared
DSpace E-prints Fedora
What you get A package with front-end web interface directly linked to a database
A package with front-end web interface directly linked to a database
A repository database, with internal database.
Server require- ments
Unix environment, Java, Apache Ant, Apache Tomcat, PostgreSQL or Oracle
Unix environment, Perl, Apache+mod-perl, MySQL
Unix or Windows, Java. (optional: MySQL or Oracle)
Subject class- ification
Yes Yes Yes
Community groups
Yes No Possible but … (see below)
Where from? MIT and Hewlett-Packard.
Southampton University, outcome of a JISC project.
Cornell University and the University of Virginia Library.
64
65
66
67
68
NDLTD
• DL case study
• Goals
• How, Workflow
• Union Catalog
• Services atop the Union Catalog
• Sustainability and Impact
• UK related report (Aug. 2006)
A Digital Library Case Study
• Domain: graduate education, research
• Genre:ETDs=electronic theses & dissertations
• Submission: http://etd.vt.edu
• Collection: http://www.theses.org
Project: Networked Digital Library of Theses & Dissertations (NDLTD) http://www.ndltd.org
70
NDLTD Goals
• For Students:– Gain knowledge and skills for the Information Age,
especially about Digital Libraries– Richer communication (digital information, multimedia, …)
• For Universities: – Easy way to enter the digital library field and benefit
thereby
• For the World: – Global digital library – large, useful, many services
NDLTD: How can a university get involved?
• Select planning/implementation team– Graduate School– Library– Computing / Information Technology– Institutional Research / Educ. Tech.
• Join online, give us contact names– www.ndltd.org/join
• Adapt Virginia Tech or other proven approach– Build interest and consensus– Start trial / allow optional submission
Student Gets CommitteeSignatures and Submits ETD
Signed
Grad School
Library Catalogs ETD, Access isOpened to the New Research
WWW
NDLTD
74
Union catalog: OCLC
• OCLC will expand OAI data provider on TDs.
• Is getting data from WorldCat (so, from many sites!).
• Will harvest from all others who contact them.
• Need DC and either ETD-MS or MARC.
• Has a set for ETDs.
75
76
77
ETD Union Search Mirror Site in China (CALIS)(http://ndltd.calis.edu.cn – popular site!)
78
79
VTLS Union CatalogContent Languages
The VTLS NDLTD Union Catalog has data in 6 different languages. These are: English German Greek Korean Portuguese Spanish
Examples follow
80
Full-text Services
• Running since Sept 2005: Scirus
• In beta test: Google Scholar
• Challenges:– Data quality problems– Inconsistency in way to get from metadata to
the full-text file(s)– Broadening the coverage since OAI use has
not spread as widely as we would like
81
• Aiding universities to enhance graduate education, publishing and IPR efforts
• Helping improve the availability and content of theses and dissertations
• Educating ALL future scholars so they can publish electronically and effectively use digital libraries (i.e., are Information Literate and can be more expressive) -> support Open Access
What are we doing?
83
UK Report of Aug. 2006
• EVALUATION OF OPTIONS FOR A UK ELECTRONIC THESIS SERVICE
• Study report edited by Alma Swan• Key Perspectives Ltd & UCL Library Services• EThOS project (Electronic Theses Online
Service) - commissioned to develop a model for a workable, sustainable and acceptable national service for the provision of open access to electronic doctoral theses.
84
EThoS: Stakeholders
• Academic registrars
• University administrators (graduate schools)
• Librarians
• Repository managers (3; 2)
• Authors (or potential authors) of theses and dissertations
85
Assessment of the organisational modelsDistributed model Centralised model Mixed architecture
modelViability Dependent upon individual
institutions’ capabilities and resources, which are highly variable
Good, providing service provider selects correct business model and satisfies HEI concerns on rights, liabilities, etc)
Good, providing service provider selects correct business model and satisfies HEI concerns on rights, liabilities, etc)
Dis-advantages
Dependent upon individual institutions’ capabilities and resources, which are highly variable. This would lead to a service of patchy quality for at least a decadePotentially chaotic with respect to standards and consistency levels
HEIs lose control to an extent and may lose some benefits in terms of PR and other institutional-purpose benefits that accrue with local service provision
Offers potential for inconsistencies unless well-managed by hub provider
Advantages Self-organising, cheap, simple HEIs need only to provide access to e-theses: central service provider does the rest:Standards applied across the board:Guaranteed consistent access:Scope for added-value services:One interface; a true national collection as well as a national gateway:Easy to hook up to other national or international services.
Gives the greatest flexibility to HEIs to select the most appropriate options; HEIs can retain control of selected elements:Standards applied across the board:Guaranteed consistent access:Scope for added-value services:One interface (multiple sites of supply): National gateway:Easy to hook up to other national or international services.
HEI commun- ity views
Strong feeling against this option Second most popular option Highest level of support for this option
Comments No support in the HEI community Strong support within HEI community
Very strong support within HEI community
86
EThoS Survey: familiar with IPR issues related to e-theses
• 8% know very little
• 30% not very familiar
• 51% familiar
• 11% very familiar
87
EThoS Survey: my institution’s handling of PhD e-theses
• 83% not yet
• 11% from some students
• 5% from most students
• 1% from all students
88
EThoS Survey: my institution’s policy position on PhD e-theses
• 55% no policies yet
• 34% current planning policies
• 11% has a policy
89
EThoS: Benefits
• Hugely increased visibility of UK doctoral research output
• Resulting in increased usage and impact of UK doctoral research output
• The opportunities for resulting new research efforts and collaborations
90
Summary: Key Ideas
• Theorem 1: Supporters of Open Access should support NDLTD.
• Theorem 2: 5S can guide us to better support of Open Access.
91
Theorem 1: Supporters of Open Access should support NDLTD - 1
• DLs will lead to enormous benefit at all levels, from personal to global.
• An IR is a type of DL, in the middle of the levels (requiring support from below, and providing support for above levels).
• Having a DL at every university (i.e., IR) greatly encourages Open Access.
92
Theorem 1: Supporters of Open Access should support NDLTD - 2
• The easiest way to launch an IR at a university is with ETDs.
• NDLTD is the lead world organization promoting ETD activities.
• NDLTD’s goals are all in support of Open Access and IRs.
93
Theorem 2: 5S can guide us to better support of Open Access - 1
• 5S helps us think formally about Open Access, hence clearly, hence to find focus.
• 5S helps us design and build DLs, hence IRs.
• Societies– Individuals: members of institution, discipline– Social influence can promote DL (re)use.– Economic and political and social issues lead us
to a distributed architecture.
94
Theorem 2: 5S can guide us to better support of Open Access - 2
• Distributed infrastructure + services lead us to harvesting (vs. federation, gathering).
• 5S helps make harvesting a success:– Streams of content flow from individuals.– Structures: ETD-ms, (browsing) classification– Spaces: indexes, interfaces– Scenarios: submission, workflow, harvesting– Societies (see above)
• More collaboration (social networks)• Prestige is more widely spread.• Access if more open
95
DL Futures
• History
• People, Content, Tools
• Sustainable Infrastructure
• Future Work
• Links
• For More Information
96
97
98
99
People
• Digital librarians
• DL system developers
• DL system administrators
• DL managers
• DL collection development staff
• DL evaluators
• DL users