managing the digitization of large press archives
DESCRIPTION
From the 2014 DLF Forum in Atlanta, GA. Session Leaders Bassem Elsayed, Bibliotheca Alexandrina Ahmed Samir, Bibliotheca Alexandrina Managing the digitization of press material is quite a challenge; not only in terms of quantity, but also in terms of text and material quality, designing the workflow system which organizes the operations, and handling the metadata. This challenge has been the focus of the Bibliotheca Alexandrina’s digitization work during the past year in the course of its partnership with the Center for Economic, Judicial, and Social Study and Documentation (CEDEJ). Having more than 800,000 pages of press articles to be digitally preserved and publicly accessed, triggered an inevitable need to design a workflow that can manage such a massive collection and handle its attributes proficiently. The deployment of this endeavor required simultaneous intervention of four main aspects; data analysis of the collection, developing a digitization workflow for the collection at hand, implementing and installing the necessary software tools for metadata entry, and finally, publishing the digital archive online for researchers and public access. The presentation will demonstrate the workflow system which is being implemented to manage this massive press collection, which has yielded to date more than 400,000 pages. It will shed some light on the BA’s Digital Assets Factory (DAF), which is the nucleus upon which the digitization process of CEDEJ collection has been built. Additionally, the presentation will discuss the tools implemented for ingesting data into the digitization process starting form indexing until the creation of batches that are ingested into the system. The outflow will also be discussed in terms of organizing and grouping multipart press clips, in addition to the reviewing, validation and correction of the output. Light will also be shed on the challenges encountered to associate the accessible online archive with a powerful search engine supporting multidimensional search while maintaining a user-friendly navigation experience.TRANSCRIPT
![Page 1: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/1.jpg)
![Page 2: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/2.jpg)
The New Library of Alexandria Overview
Bibliotheca Alexandrina (BA)
![Page 3: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/3.jpg)
Ø Center of excellence in the production and dissemination of knowledge
Ø Place of dialogue, learning and understanding between cultures and peoples
![Page 4: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/4.jpg)
Ø The World’s Window on Egypt
Ø Egypt’s Window on the World Ø Instrument for Rising to the Challenges of
the Digital Age
Ø Center for Dialogue Between Peoples and Civilizations
![Page 5: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/5.jpg)
Not just a Library of Books but rather a vast cultural and scientific complex
![Page 6: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/6.jpg)
A library that can accommodate millions of books
![Page 7: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/7.jpg)
7
http://archive.bibalex.org
![Page 8: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/8.jpg)
8
![Page 9: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/9.jpg)
![Page 10: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/10.jpg)
![Page 11: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/11.jpg)
![Page 12: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/12.jpg)
![Page 13: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/13.jpg)
![Page 14: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/14.jpg)
14
![Page 15: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/15.jpg)
15
http://descegy.bibalex.org
![Page 16: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/16.jpg)
16
http://lartarab.bibalex.org
![Page 17: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/17.jpg)
17
More than 230,000 Arabic books are freely available online for Arabic
readers worldwide
![Page 18: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/18.jpg)
18
http://suezcanal.bibalex.org
![Page 19: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/19.jpg)
19
![Page 20: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/20.jpg)
20
http://naguib.bibalex.org/
![Page 21: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/21.jpg)
21
http://nasser.bibalex.org
![Page 22: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/22.jpg)
22
http://sadat.bibalex.org
![Page 23: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/23.jpg)
![Page 24: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/24.jpg)
Ø Project Overview Ø Collection Overview Ø Data Representation Ø System Workflow
� DAF (Digital Assets Factory) � Cataloguing � Website
§ Solr search Engine § Article Viewer
24
![Page 25: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/25.jpg)
25
![Page 26: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/26.jpg)
Ø Centre for Economic, Judicial, and Social Study and Documentation (CEDEJ) collaborated with Bibliotheca Alexandrina (BA) for the digitization of its archive of massive press articles collection
Ø The project consists of multiple modules to: � Index the Press Archive Collection � Control data entry workflow � Digitize and process data � Catalogue and review Articles � Archive Web Publishing
26
![Page 27: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/27.jpg)
27
![Page 28: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/28.jpg)
Ø Package of press archive � 800,000+ press clips varying between
§ Press § Reports
� 500+ publishers � 60,000+ writers and reporters � 200 Different subjects
§ Economic, politics, social life, etc… � Archive Languages:
§ Arabic, English and French � Date range from 1966 to 2009
28
![Page 29: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/29.jpg)
Ø Finished so far � 115,000 press clips varying between
§ Press § Reports
� 200 publishers � 14,000 writers and reporters � 100 Different subjects
§ Economic, politics, social life, etc… � Archive Languages:
§ Arabic, English and French � Date range from 1966 to 2009
29
![Page 30: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/30.jpg)
30
![Page 31: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/31.jpg)
Ø A list of packaged press archive is submitted to
Bibliotheca Alexandrina to be scanned and catalogued
Ø Source of data is a collection of boxes Ø The box is organized on the following
hierarchy � Folder � File � Sub-File � Document
Ø Document represents a single page of press
31
![Page 32: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/32.jpg)
32
![Page 33: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/33.jpg)
33
![Page 34: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/34.jpg)
34
![Page 35: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/35.jpg)
35
![Page 36: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/36.jpg)
36
![Page 37: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/37.jpg)
37
![Page 38: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/38.jpg)
38
![Page 39: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/39.jpg)
Article Creation
39
![Page 40: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/40.jpg)
Article Metadata
40
![Page 41: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/41.jpg)
Lookups Management
41
![Page 42: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/42.jpg)
Reports
42
![Page 43: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/43.jpg)
43
![Page 44: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/44.jpg)
44
![Page 45: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/45.jpg)
45
![Page 46: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/46.jpg)
Ø Based on Apache Lucene project v4.1
Ø SolrNet API is used to connect to Solr server
Ø Features � Simple/Advanced search � Results Highlighting � Fields AutoComplete � Text search (Article Viewer)
46
![Page 47: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/47.jpg)
47
![Page 48: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/48.jpg)
48
![Page 49: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/49.jpg)
49
![Page 50: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/50.jpg)
50
![Page 51: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/51.jpg)
51
![Page 52: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/52.jpg)
52
![Page 53: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/53.jpg)
53
![Page 54: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/54.jpg)
Ø Article viewer is used for previewing articles � It is one of multiple viewers developed at BA
Ø Architecture � Server Side: RESTful services � Client Side: JavaScript using JSONP
Ø Features � Image preview � Metadata preview � Text selection � Searching/highlighting � Zooming options: fit width/height
54
![Page 55: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/55.jpg)
Ø Viewer Web Services � Metadata Web Service:
§ Retrieve article catalogue metadata § Return technical information (width, height, page
count..) � Content Web Service:
§ Retrieve the image of each single page in the article applying scaling to custom width and height responsively
§ Return the selected text based on the user highlighted area
� Search Web Service: § Perform the search using Solr engine APIs in the
content of the articles § Highlight the matching phrases in the article image
55
![Page 56: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/56.jpg)
56
![Page 57: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/57.jpg)
57
![Page 58: Managing the Digitization of Large Press Archives](https://reader038.vdocuments.site/reader038/viewer/2022103018/558e1b741a28abcf5b8b463a/html5/thumbnails/58.jpg)
58