hathi trust a shared digital repository unpacking hathitrusts new cost model jeremy york project...
TRANSCRIPT
HATHI TRUST A Shared Digital Repository
Unpacking HathiTrust’s New Cost Model
Jeremy YorkProject Librarian, HathiTrust
SUNYJuly 15, 2011
About
PartnershipArizona State UniversityBoston UniversityBaylor UniversityCalifornia Digital LibraryColumbia UniversityCornell UniversityDartmouth CollegeDuke UniversityEmory UniversityHarvard University LibraryIndiana UniversityJohns Hopkins UniversityLafayette CollegeLibrary of CongressMassachusetts Institute of
TechnologyMichigan State UniversityNew York UniversityNew York Public LibraryNorth Carolina Central
University
North Carolina State UniversityNorthwestern UniversityThe Ohio State UniversityThe Pennsylvania State
UniversityPrinceton UniversityPurdue UniversityStanford UniversityTexas A&M UniversityUniversidad Complutense de
MadridUniversity of California
BerkeleyDavisIrvineLos AngelesMercedRiversideSan DiegoSan FranciscoSanta BarbaraSanta Cruz
The University of ChicagoUniversity of FloridaUniversity of IllinoisUniversity of Illinois at ChicagoThe University of IowaUniversity of MarylandUniversity of MichiganUniversity of MinnesotaThe University of North Carolina at Chapel HillUniversity of Notre DameUniversity of PennsylvaniaUniversity of PittsburghUniversity of UtahUniversity of VirginiaUniversity of WashingtonUniversity of Wisconsin-MadisonUtah State UniversityYale University Library
Digital Repository
• Launched 2008• Initial focus on digitized book and journal
content• “Light” archive
– As accessible as possible within the bounds of law
Statistics
• 8,980,200 volumes• 4,679,248 book titles• 214,155 serial titles• 2,450,522 “public domain”
The Name
• The meaning behind the name– Hathi (hah-tee)--Hindi for elephant– Big, strong– Never forgets, wise– Secure– Trustworthy
Mission
• To contribute to the common good by collecting, organizing, preserving, communicating, and sharing the record of human knowledge
Goals
• Comprehensive collection• Preservation…with Access• Shared strategies
– Collection management, development– Preservation– Copyright– Efficient user services
• Openness
Governance
Governance
HathiTrustHathiTrust
Executive Committee
Strategic Advisory Board
Strategic Advisory BoardBudget/Finances
Decision-making
Guidance on Policy, Planning
Executive Committee
• Paul Courant, University Librarian and Dean of Libraries, UM• Laine Farley, Executive Director, CDL• John King, Vice Provost for Academic Information, UM• Paula Kaufman, University Librarian and Dean of Libraries, UI• Brian Schottlaender, University Librarian, UCSD• Ed Van Gemert, Deputy Director of Libraries, UW – Madison
(ex officio)• Brenda Johnson, Dean of Libraries, IU• Brad Wheeler, Chief Information Officer, IU• John Wilkin, Executive Director of HathiTrust and
Associate University Librarian, LIT, UM
Strategic Advisory Board• Ed Van Gemert (Chair), Deputy Director of Libraries,
University of Wisconsin - Madison• John Butler, AUL for Information Technology, University of
Minnesota• Patricia Cruse, Director, Preservation, CDL• Todd Grappone, AUL for Digital Initiatives & IT, UCLA• Julia Kochi, Director, Digital Library and Collections, UC San
Francisco• Sarah Pritchard, University Librarian, Northwestern University• Paul Soderdahl, Director, LIT, University of Iowa• John Wilkin, Executive Director, HathiTrust (ex officio)• Robert Wolven, Columbia University
Constitutional Convention
• October 2011• Delegates from each institution and
consortium– Carry certain number of votes determined
according to formula approved by Executive Committee
• 3-year review• Proposals
– Print management– Ballot proposals
Partnership
Partnership
• Who can become a partner?– Institutions worldwide– Libraries with print holdings
What are the benefits? (1)
• Cost-effective long-term preservation and access services for digitized content– Commitments on digital content facilitate decisions about
digitization efforts and print collection management• For those with content, immediately offering long-term
preservation, bibliographic and full-text search, collection-building
• With content or not, full viewing and downloading capabilities for public domain materials and materials for which we have received permissions
What are the benefits? (2)
• Specialized access to public domain and in-copyright materials for users with print disabilities
• Other lawful uses of in copyright materials such as Section 108 uses (print replacement copies, digital access to applicable works), access to orphan works
• HathiTrust encourages participation in initiatives and resources geared toward– Shared collection development and management (e.g., copyright
review work, print holdings database, de-duplication, collaboration with other organizations and initiatives)
– Participation in governance and collaborative initiatives– Defining future directions of the shared library.
What’s involved?
• Contract– Sustaining– Content-Contributing
• Yearly fees• Commitment
– 5-year periods
• Shibboleth• Print Holdings
Costs
• Base funding from partner institutions• Basic infrastructure costs• Commitments in 5-year periods
How much does it cost? (1)
How much does it cost? (2)
• $0.149/volume/year for Google-digitized• $0.489/volume/year for IA-digitized• $0.154/volume/year for all content
• $3.40 per GB
Financial contributions of partners
HathiTrust Functional Framework
How does it work? (1)
• Sustaining membership is base– Pricing model for all partners beginning 2013– Based on overlap of HathiTrust volumes with
institutions’ print holdings– Share in infrastructure costs for public domain
volumes: • (PD*C*X)/N
– Share in infrastructure costs for in copyright volumes based on holdings• For a given in copyright volume:• IC=(C*X)/H
How does it work? (2)
• Main factors in costs are– Amount of content– Number of partners– Also a flexible multiplier designed to pay for
programmatic activities
• Tend to result in lower costs and more benefits over time
Example
• Factors– 1,000,000 PD volumes– 3,000,000 IC volumes– $0.154 per volume– 60 partners– Assume on average 12 institutions hold IC volumes
• Costs– PD = (1,000,000 * .154 * 1.5) / 60 = $3,850– IC = (3,000,000 * .154 * 1.5) / 12 = $57,750– Total = $61,600
How does it work? (3)
• In order to support these calculations– Need print holdings database (2013)– Update mechanisms– Manual remediation
• Analysis will also support– Expansion of legal uses of materials, to users who
have print disabilities, to orphan works– Facilitate collaborative collection development
and management operations– Will also benefit efforts in de-duplication
Print Holdings Database
• Volumes institutions own or have owned– Only print volumes (not microform, etc.)– OCLC number [required]– Bib record ID [required]– Condition (e.g., brittle) [optional]– Holding Status (e.g., current holding, withdrawn,
missing, etc.) [optional]
Percent Overlap
Average = 37.4%
Questions
• Why not get the information from OCLC?• Is it necessary to declare all volumes held, or
could an institution choose not to declare some?
• Are the print holdings data currently provided by institutions taken as an indication of the volumes institutions are declaring they have access to?
What are we doing currently?
• Basing yearly fees on estimates– Based on infrastructure costs of anticipated
content– Estimated partnership growth – Institution total volume counts
SUNY Costs
• SUNY University Centers– Albany, Binghamton, Buffalo, Stony Brook, Update
and Downstate Medical Libraries– 11,049,952 volumes
• All SUNY (based on 16,000,000 titles)– 27 institutions total– 20,800,000 volumes
SUNY costs (2)
• Estimate using– 9,500,000 volumes at end of 2011– 60 partners (for University Centers and Medical
libraries)– 87 partners (for all SUNY libraries)– Multiplier of 1.5
SUNY costs (3)
• University Centers– Public Domain
• Total PD cost * 1.5 / #partners * 6 = $70,903.22
– In Copyright• % of holdings (partner holdings / total holdings) * Total
IC cost * 1.5 = $67,635.06
– Total = $138,538.28• Prorated from August 1 = $58,072.21
SUNY costs (3)
• All SUNY– Public Domain
• Total PD cost * 1.5 / 87 * 27 = $220,044.49
– In Copyright• % holdings (partner holdings / total holdings) * Total IC
cost * 1.5 = $127,198.61
– Total = $347,243.09• Prorated from August 1 = $145,556.69
Sustaining v. Content-Contributing
• Does not exclude contribution of content• If contribute content, costs covered up to
amount that would be paid as Sustaining partner– Barring additional costs that might be needed to
accommodate content (e.g., specialized load routines, generation of OCR)
• Above that, pay per-GB cost ($3.40)
Summary
• Partners share in costs of sustaining common resource
• Share in uses of relevant materials• Voice in future directions • Costs to institutions go down• Quality of services increases
– Realize in aggregated collection, something don’t get through distributed search or federation
• Free riders?
Changing Library Landscape
• Rapidly changing landscape• Libraries are making these decisions but they
are more and more collective decisions• We cannot afford anymore to do work
separately that could be done collaboratively
HathiTrust overall benefits to libraries
• Digital Curation– Drive costs down– Reduce “bibliographic indeterminacy”– Make meaningful decisions about formats and quality– Increase discoverability, use– Consolidate development talent– Improve strength of archiving
• Print Curation– Means to associate our print holdings– Coordinated record-keeping
• Subsidiary benefits– Quantify problems– Collective attention to solving shared problems
How to find out more• Web site “About” section:
http://www.hathitrust.org/about• Twitter: http://twitter.com/hathitrust• Monthly newsletter:
http://www.hathitrust.org/updates• RSS: http://www.hathitrust.org/updates_rss
• Contact us: [email protected]
Thank you very much!