introduction to digital library technology - the invenio ...ais-grid-2011.jinr.ru/docs/j-y. le...
TRANSCRIPT
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Introduction to Digital Library TechnologyThe INVENIO software
J-Y. Le Meur
Department of Information TechnologyCERN
24-10-2011 /JINR-CERN School on GRID andInformation Management Systems
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Outline
1 Digital LibraryDefinitionsExamplesStandardsSummary
2 Digital Library SoftwareSpecialized SW - HistoryInvenio featuresInvenio modular architecture
3 TechnologyOverviewPythonDevelopment environmentBuilding efficient Indexes
4 Conclusion
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Outline
1 Digital LibraryDefinitionsExamplesStandardsSummary
2 Digital Library SoftwareSpecialized SW - HistoryInvenio featuresInvenio modular architecture
3 TechnologyOverviewPythonDevelopment environmentBuilding efficient Indexes
4 Conclusion
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
What is a Digital Library ?
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
What is a Digital Library ?
A library in which collections are stored in digitalformats (as opposed to print, microform, or othermedia) and accessible by computers. (...) A digitallibrary is a type of information retrieval system.A virtual organisation, that comprehensively collects,manages and preserves for the long time rich digitalcontent, and offers to its targeted user communitiesspecialised functionality on that content.(1) institutional document repositories(2) world-wide subject-based information systems
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
What is a Digital Library ?
A library in which collections are stored in digitalformats (as opposed to print, microform, or othermedia) and accessible by computers. (...) A digitallibrary is a type of information retrieval system.A virtual organisation, that comprehensively collects,manages and preserves for the long time rich digitalcontent, and offers to its targeted user communitiesspecialised functionality on that content.(1) institutional document repositories(2) world-wide subject-based information systems
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Some key notions
Repository type: institutional versus disciplinaryHybrid libraries: electronic resources versus traditionalprint materialContent type: born digital versus converted contentArchive concept: traditional Archive versus digitalArchiveLibrary type: digital versus virtualOpen access: Green versus Gold
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Outline
1 Digital LibraryDefinitionsExamplesStandardsSummary
2 Digital Library SoftwareSpecialized SW - HistoryInvenio featuresInvenio modular architecture
3 TechnologyOverviewPythonDevelopment environmentBuilding efficient Indexes
4 Conclusion
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
What is a Digital Library ?Ex 1: CERN Document Server
Example 1: CERN Document Server
managing CERN and selected non-CERN high-energyphysics and related documents since 1993more than 1,000,000 recordsarticles, books, theses, photos, videos, and morepowered by Invenio, free digital library softwarehttp://cdsweb.cern.ch/
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
CDS: Collection tree
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
CDS: Search for Books
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
CDS: Search for photos
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
CDS Feature: Commenting
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
CDS Feature: Create Personnal Alert
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
CDS Feature: Add to Basket
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
CDS Feature: Display Personnal Basket
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
CDS Feature: Organise and Share Baskets
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
CDS: Journals and Bulletins
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
What is a Digital Library ?Ex 2: INSPIRE
Example 2: INSPIRE
world-wide high-energy physics information systemrun by CERN, DESY, FNAL, SLACmetadata curation since 1960s, Invenio technologysince 2007citation analysis, author/affiliation analysisclose partnership with arXiv and ADShttp://inspirehep.org/
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
INSPIRE: full-text search
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
INSPIRE: Cite Summary
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
INSPIRE: Citation History
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
INSPIRE: Author pages
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Whats is a Digital Library ?Ex3: The JDS Digital Library: jdsweb.jinr.ru
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Other Famous examples ?
Are these digital libraries ?Google Web | Books | Scholar ?Eprint ArXivLibrary of Congress and American MemoryInternet ArchiveWorld’s total yearly production of print, film, optical, andmagnetic content would require roughly 1.5 billiongigabytes of storage
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Other Famous examples ?
Are these digital libraries ?Google Web | Books | Scholar ?Eprint ArXivLibrary of Congress and American MemoryInternet ArchiveWorld’s total yearly production of print, film, optical, andmagnetic content would require roughly 1.5 billiongigabytes of storage
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Other Famous examples ?
Are these digital libraries ?Google Web | Books | Scholar ?Eprint ArXivLibrary of Congress and American MemoryInternet ArchiveWorld’s total yearly production of print, film, optical, andmagnetic content would require roughly 1.5 billiongigabytes of storage
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Other Famous examples ?
Are these digital libraries ?Google Web | Books | Scholar ?Eprint ArXivLibrary of Congress and American MemoryInternet ArchiveWorld’s total yearly production of print, film, optical, andmagnetic content would require roughly 1.5 billiongigabytes of storage
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Other Famous examples ?
Are these digital libraries ?Google Web | Books | Scholar ?Eprint ArXivLibrary of Congress and American MemoryInternet ArchiveWorld’s total yearly production of print, film, optical, andmagnetic content would require roughly 1.5 billiongigabytes of storage
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Outline
1 Digital LibraryDefinitionsExamplesStandardsSummary
2 Digital Library SoftwareSpecialized SW - HistoryInvenio featuresInvenio modular architecture
3 TechnologyOverviewPythonDevelopment environmentBuilding efficient Indexes
4 Conclusion
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Library Standardsexchange, identifiers and preservation
Exchange protocols: Z39.50 and OAI-PMHbetween Data and Service providers
Interoperability: SWORD = Simple Web-serviceOffering Repository DepositIdentifiers: ISBN and DOIPreservation: METS, PDF/A, OAIS
Content description: Metadata Encoding andTransmission StandardData formatsSupporting system: Open Archival Information Systemref. model
Content representation: MARC, DCXML-MARC
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Library StandardsContent representation
Metadata: data about dataMetadata types: descriptive, structural andadministrativeMetadata schema: set of defined elements (e.g.MARC, DC)MARC: MAchine Readable Cataloguing, internationalstandard for representing and communicatingbibliographic records, developed in the 60s, cataloguecard oriented, high degree of complexity to cover allpurposeXML-MARC: XML schema based on MARC21developed by Library of Congress
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
MARC and XML-MARC examples
Tags, identifiers and subcodes001__ 1337270
037__ $$aCERN-PH-EP-2011-030
100__ $$aClerbaux, Barbara $$eed. $$iINSPIRE-00314890 $$uBrussels U.
245__ $$aSearch for New Physics in Dijet
260__ $$c2011
520__ $$aA search for new interactions and resonances [..]
XML-MARC: tag 100<datafield tag="100" ind1=" " ind2=" ">
<subfield code="a">Clerbaux, Barbara</subfield>
<subfield code=“e">ed.</subfield>
<subfield code=“i”>INSPIRE-00314890</subfield>
<subfield code="u">Brussels U.</subfield>
</datafield>
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Outline
1 Digital LibraryDefinitionsExamplesStandardsSummary
2 Digital Library SoftwareSpecialized SW - HistoryInvenio featuresInvenio modular architecture
3 TechnologyOverviewPythonDevelopment environmentBuilding efficient Indexes
4 Conclusion
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Digital LibrarySummary
Definitions of a digital libraryThe variety of types and concepts behind "DigitalLibrary"Examples of institutional and subject-basedrepositoriesSome functionnalities of Digital LibraiesSome important standards: MARC, SWORD, OAI-PMHNext: the need for specialized software
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Digital LibrarySummary
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Outline
1 Digital LibraryDefinitionsExamplesStandardsSummary
2 Digital Library SoftwareSpecialized SW - HistoryInvenio featuresInvenio modular architecture
3 TechnologyOverviewPythonDevelopment environmentBuilding efficient Indexes
4 Conclusion
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Why specialized Software ?
Specialist software for building, maintaining, managing orrunning digital libraries.Institutional repository software focuses primarily on ingest,preservation and access of locally produced documents,particularly locally produced academic outputs.
Content is organized and ready for exchange (supportof interoperability protocols)Metadata and Data is preserved for long term (supportof preservation standards)Submission, Edition, Curation processes are supportedDissemination is organized and controlledSW examples: Eprints, DSpace, Fedora, Greenstone...Invenio
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Why specialized Software ?
Specialist software for building, maintaining, managing orrunning digital libraries.Institutional repository software focuses primarily on ingest,preservation and access of locally produced documents,particularly locally produced academic outputs.
Content is organized and ready for exchange (supportof interoperability protocols)Metadata and Data is preserved for long term (supportof preservation standards)Submission, Edition, Curation processes are supportedDissemination is organized and controlledSW examples: Eprints, DSpace, Fedora, Greenstone...Invenio
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Why at CERN ?an interesting challenge
A physicist office
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Invenio History
1954: CERN laboratory is created1989: Tim Berners-Lee invents the Web1991: SPIRES (SLAC) is the first database on the WebArXiv, the archive of Physics papers, moves to the Web1993: CERN Preprint Server starts as an institutionaland disciplinary repository1996: CERN Library Server includes Books andPeriodicals, as an hybrid library2000: CERN Document Server includes Multimediamaterial and restricted notes2002: CDSWare SW released open source2006: CDSWare becomes Invenio; start of I18Ncollaborations2010: Invenio 1.0 released and adopted world-wide
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Outline
1 Digital LibraryDefinitionsExamplesStandardsSummary
2 Digital Library SoftwareSpecialized SW - HistoryInvenio featuresInvenio modular architecture
3 TechnologyOverviewPythonDevelopment environmentBuilding efficient Indexes
4 Conclusion
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Key featuresinvenio-software.org
navigable collection tree (regular, virtual, hosted)powerful search engine
Google-like speed for up to 5M recordscombined metadata, reference and fulltext search
flexible metadata (MARC, OA)handling any kind of document (multimedia)customizable input, formatting and linking
personalization and collaborative features:alerts, baskets, groups, reviews, commentsinternationalisation (28 languages)
Books management and circulationopen source, GNU General Public License
co-developed by CERN (2002–), EPFL (2004–),DESY/FNAL/SLAC (2008–), CfA (2009–)installed at > 40 institutions world-wide
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Extra FeaturesPlugins
Compatibility withLibX: Invenio toolbar
LibX: http://libx.org/editions/download.php?edition=4F46CD81
Can be integrated with IExplorer and FirefoxbrowsersIntegration with the main digital content websitesincluding Amazon, Google Schoolar, WikipediaHighlighted text from a web page can be used todirectly query an Invenio installation
Zotero: Invenio can export its content to Zotero Firefoxplugin for compiling CVsCooliris: Invenio supports browsing multimedia contentas a 3 dimensional wall (due to the integration with theCooliris plugin)
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Outline
1 Digital LibraryDefinitionsExamplesStandardsSummary
2 Digital Library SoftwareSpecialized SW - HistoryInvenio featuresInvenio modular architecture
3 TechnologyOverviewPythonDevelopment environmentBuilding efficient Indexes
4 Conclusion
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Modules Overview
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Modules OverviewScheduler
Monitoring and scheduling processes
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Ingestion ModulesOverview
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Ingestion ModulesSubmission: interfaces, workflows and functions
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Ingestion ModulesSubmission: interfaces, workflows and functions
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Processing ModulesOverview
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Processing ModulesExample: indexing
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Processing ModulesExample: ranking
Most Cited: count citationsAll-Times Best: PageRank (Google)‘Hot’ Trends: time-aware pagerank
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Processing ModulesExample: ranking
inspirehep.net (500 random points)
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Dissemination ModulesOverview
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Dissemination ModulesSearch Examples
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Curation ModulesOverview
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Curation ModulesExamples
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Access ModuleAuthentication management
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Access ModuleAuthorization management
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Modules SummaryInvenio
About 33 modulescodebase
290,000 lines of Python code12,000 lines of JavaScript code6,000 lines of XSL code5,000 lines of autotools code500 test cases
75 authors since inception25 authors and contributors in 2010many short-term studentsimportance of informal coding standards
10 years of development, started at CERN, first releasein 2002, now co-developed world-wide (EU, US)
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Outline
1 Digital LibraryDefinitionsExamplesStandardsSummary
2 Digital Library SoftwareSpecialized SW - HistoryInvenio featuresInvenio modular architecture
3 TechnologyOverviewPythonDevelopment environmentBuilding efficient Indexes
4 Conclusion
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Technology Overviewused technologies
OpenSource GPL projectUnix/Linux Server sidePython (and C and Lisp), MySQL and Apache +mod_wsgiOther smaller dependenciesBased on open standards (MARCXML, MARC21,OAI-PMH, OpenURL...)Medium to big data repositoriesFlexible at every layer
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Technology Overviewconcepts
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Technology Overviewlanguages
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Outline
1 Digital LibraryDefinitionsExamplesStandardsSummary
2 Digital Library SoftwareSpecialized SW - HistoryInvenio featuresInvenio modular architecture
3 TechnologyOverviewPythonDevelopment environmentBuilding efficient Indexes
4 Conclusion
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Why Python ?languages
easy to read and understand (good for many temporarydevelopers)suitable for rapid prototyping (good for organic-growthsoftware development model)write code to throw it away
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Why Python ?art of ikebana programming
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Why Python ?Speeding up Pyhton
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Outline
1 Digital LibraryDefinitionsExamplesStandardsSummary
2 Digital Library SoftwareSpecialized SW - HistoryInvenio featuresInvenio modular architecture
3 TechnologyOverviewPythonDevelopment environmentBuilding efficient Indexes
4 Conclusion
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Development modelGit distributed environment
good for distributed teamsoffline development possible“pull on demand” collaboration model (as opposed to“shared push” collaboration model)
inherent,natural code review process
commit early, commit often (to private repositories)rebase and clean (before pushing for publicconsumption)interplay with SVN
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Development modelGit collaboration model
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Development modelTest Suite: unit test
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Development modelTest Suite: functional test
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Development modelTest Suite: web testing
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Outline
1 Digital LibraryDefinitionsExamplesStandardsSummary
2 Digital Library SoftwareSpecialized SW - HistoryInvenio featuresInvenio modular architecture
3 TechnologyOverviewPythonDevelopment environmentBuilding efficient Indexes
4 Conclusion
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Building Indexesloading Web vs App Server
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Building Indexesload split
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Building Indexesdesigning a search engine
performance-driven design assumptions:high number of selects, low number of updatesfast searching, slow indexationcache everything cacheable
search functionality:search for words, phrases, regular expressionssearch in any field, authors, titles, etc
index design:forward indexes: rec1 –> [word1, word8, . . . ]rec2 –> [word1, word2, . . . ]reverse indexes: word1 –> [rec1, rec2, . . . ]word2 –> [rec2, rec7, . . . ]
Zipf’s law on word frequency:few words occur very often (e.g. the)most words are infrequent (even e.g. boson)
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Building IndexesSearch engine under cover
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Building IndexesMeasuring the performance
three important speed factors to consider:speed of finding sets (DB Server)speed of demarshaling sets (DB <–> Web App Server)speed of intersecting sets (Web App Server)
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Building IndexesOptimizing data structures
data structures tested:‘sorted’ (lists, Patricia trees)‘unsorted’ (hashed sets, binary vectors)
fast prototyping: (Python, Lisp in 2002)
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Building IndexesBinary vectors
binary vectors found the best compromise!using Numeric Python moduletypical search time gain: 4.0 sec –> 0.2 sectypical indexing time loss: 7 hours –> 4 daysmostly spare data modelled via mostly dense datastructure? free your mind, think critically
further optimization:Numeric module not addressing real bits, only bytesso home-made intbitset C extension in 2007
addressing real bits (factor of 8 already)saving space, saving (indexing) time
Digital Library
J-Y Le Meur
Digital LibraryDefinitions
Examples
Standards
Summary
Digital LibrarySoftwareSpecialized SW -History
Invenio features
Invenio modulararchitecture
TechnologyOverview
Python
Developmentenvironment
Building efficientIndexes
Conclusion
Conclusion
selected lessons from building a digital library systemwith about 300,000 LOCs from 75 authors over 10years
value of rapid prototypingvalue of organic-growth software development modelvalue of coding aesthetics and minimalism
Evolution and challenges of digital librariesIncrease of InteroperabilityOpen Access and Publising model evolutionThe Data Continuum, connecting DLs and ScienceDatasets