open past: digital projects from government libraries finance canada statistics canada library of...
TRANSCRIPT
Open Past: Digital Projects from Government Libraries
Finance CanadaStatistics CanadaLibrary of Parliament
June 1, 2012
CLA Conference 2012“Share your thoughts with fellow delegates and CLA
members while attending the conference. The twitter hashtag is #CLAOTT2012 or you can blog or "Facebook" from CLA 2012 in Ottawa. Go to the CLA website, http://www.cla.ca/conference/2012for the CLA from Away links.”
Overview
Introductions
Finance Canada
Statistics Canada
Library of Parliament
Questions
Finance CanadaDigitizing the Federal BudgetEileen Bays-CouttsIona HendersonJune 1, 2012
Library Digitization Goals To increase Web access to and discoverability of
federal budget publications
To address service delivery issues
Pilot: To assess digitization, repository, and metadata requirements.
Pilot Phase March 2010, digitized the 1952 to 1994 Speech,
Plan, and Budget in Brief publications.
Used in-house photocopier and casual staff.
Publications scanned to PDF and files optimized using Adobe Acrobat Pro OCR and tagging processes.
Pilot Continued
Sample of OCR coding errors underlying PDFs:
I am honoured, Madam Speaker, to have the opportunity to present to Parliament the first b6'dget of this new decade. It is a bUdget which sets newdirections for the economy~directions which willensure both energy securityand economic securit'y for Canadians in the years ahead.It would b~ no service to this House, nor to Qanadians, to deny that there is adeeply troubling air of uncertainty and anxiety around the world and, I am sure, in the hearts and minds of Canadians; we have inherited many difficulties fromthe decade of the 70s. But It would be just as wrong to deny that the decade ofthe 80s provides extraordinary oppo.rtunities for Canada and Canadians.
Pilot Continued
Results:• Low cost • Crawlable and searchable files• 3% to 5% OCR error rate.
Conclusion – error rate unacceptable.
Project Phase Goal to produce CLF2 compliant, 99.5%
error-free OCR text
Work competitively outsourced in 2010/11 to Terra Reproductions
Same scope as Pilot phase.
Project Continued
Full specs were provided to the company including generic metadata; metadata to be enhanced later.
Results:• error rate of 0.5% or lower• But discovered some gaps
Getting to the Web• Add: 1968 to 1994• Enhance user experience
2007 to 2012 budget.gc.ca
1995 to 2006 fin.gc.ca
Inspiration
Getting to the Web Continued
Additional metadata added to files Prime Minister Finance Minister Parliament number Political party
Became our filtering criteria + the year
Getting to the Web Continued
JQuery used for sorting functionality
Some browser issues with display so custom style sheets developed
Clean up of 1995 – 1999 PDFs on FIN
Final Product!
www.budget.gc.ca/pdfarch/index-eng.html
Going Forward Fill gaps in collections
Enhance metadata
Improve layout and functionality
Add additional PDF documents from years 1994 – 2011
Improve accessibility of PDFs
Thank you
http://www.budget.gc.ca/pdfarch/index-eng.html