development of the cybercemetery (2011)

32
Development & Practice in the CyberCemetery Starr Hoffman Head, Government Documents Dept. University of North Texas Libraries 25 September 2011

Upload: dr-starr-hoffman

Post on 23-Jun-2015

142 views

Category:

Education


0 download

DESCRIPTION

Latest presentation on the development of the CyberCemetery, an archive of "dead" websites for now-defunct government agencies and commissions. The CyberCemetery archive is maintained by the University of North Texas (UNT) Libraries, an Affiliated Archive of the National Archives and Records Administration (NARA).

TRANSCRIPT

Page 1: Development of the CyberCemetery (2011)

Development & Practice in the CyberCemetery

Starr HoffmanHead, Government Documents Dept.

University of North Texas Libraries25 September 2011

Page 2: Development of the CyberCemetery (2011)

• Intro What is the CyberCemetery?• Purpose Why create a

CyberCemetery?• Development• Archiving Process• Technical Details• User Demographics Who uses the

CyberCemetery?• Conclusion

Page 3: Development of the CyberCemetery (2011)

http://digital.library.unt.edu/explore/collections/GDCC/

Page 4: Development of the CyberCemetery (2011)

• online archive of websites from U.S. government agencies or commissions that are no longer operating

http://digital.library.unt.edu/explore/collections/GDCC/

Page 5: Development of the CyberCemetery (2011)

• online archive of websites from U.S. government agencies or commissions that are no longer operating

• “snapshot” of each website as it existed before “pulling the plug”

• maintained by the University of North Texas Libraries

• freely accessible world-wide

• affiliated NARA archive (National Archives and Records Administration)

http://digital.library.unt.edu/explore/collections/GDCC/

Page 6: Development of the CyberCemetery (2011)
Page 7: Development of the CyberCemetery (2011)
Page 8: Development of the CyberCemetery (2011)
Page 9: Development of the CyberCemetery (2011)
Page 10: Development of the CyberCemetery (2011)

1997 - present 2008 - present

Page 11: Development of the CyberCemetery (2011)
Page 12: Development of the CyberCemetery (2011)

• Protect At-Risk Information:• 1990’s: U.S. government information = online• born-digital• edited or removed without warning

• Federal Depository Library Program (FDLP)• administered by U.S. Government Printing Office (GPO)• mission: to provide free, permanent public access to

government information• online information complicates this mission• University of North Texas is a federal depository library

Page 13: Development of the CyberCemetery (2011)
Page 14: Development of the CyberCemetery (2011)

1995 e-docs at risk

Government Printing Office

(GPO) publishes

report stating need to preserve electronic

government publications

Page 15: Development of the CyberCemetery (2011)

1997 GPO + UNT

University of North Texas (UNT) talks

to GPO about

forming a partnership

Page 16: Development of the CyberCemetery (2011)

1997

ACIR archive

d

UNT archives website of the

Advisory Commission

on Intergovernm

ental Relations

(ACIR)

Page 17: Development of the CyberCemetery (2011)

1999 GPO + UNT =

expanded

permanent public access, expanded to

multiple websites, & any

agency or commission no

longer operating

Page 18: Development of the CyberCemetery (2011)

1999 CyberCemeter

y

archive is named

“CyberCemetery” because

websites are from “dead” agencies &

commissions

Page 19: Development of the CyberCemetery (2011)

2006

GPO + UNT + NARA

partnership now includes

the U.S. National

Archives and Records

Administration (NARA)

Page 20: Development of the CyberCemetery (2011)

2011

73+ websites archived

Page 21: Development of the CyberCemetery (2011)

1. Identify at-risk government agencies and commissions

• contacted directly by agency/commission• contacted by GPO • read/listen to news • read government-related websites & blogs• targeted search-engine queries

• (“final report” + .gov)• referrals from other librarians, patrons

Page 22: Development of the CyberCemetery (2011)

2. Evaluate the website• must be an official government website• the agency or commission must:

• be closing• issued a final report• other indication that the website is at-risk

Page 23: Development of the CyberCemetery (2011)

2. Evaluate the website (continued) Questions for website administrator:

What operating system was used to host this website? What webserver software was used for the hosting of this website? Are server side includes (ssi) used in this website? Was this website static html or a dynamic site?

If dynamic, what scripting languages were used for this website (php, perl, python)?

Was a database used for this website?2. If so, what database was used for this website?3. What methods were used to connect to the database?

Is there streaming media associated with this website? Are there proprietary content types used in this website? Are there any comments you would like to add?

Page 24: Development of the CyberCemetery (2011)

3. Harvest the website• software: Heritrix (from Internet Archive)

• http://crawler.archive.org/ • downloads content• bundles all content into WARC file• WARC = website in a single file• no manipulation of code or content

4. Access archived website• software: Wayback (from Internet Archive)

• http://archive-access.sourceforge.net/projects/wayback/ • retrieves content from WARC• add banner notifying archived status

Page 25: Development of the CyberCemetery (2011)

5. Harvesting alternative: Donated content• directly receive files from agency or commission

• Why not donated content?• Content could be altered • Harvesting = exact copy of online published content

• Why donated content?• If content cannot be accessed by harvesting • flash video, large amounts of media• rarely necessary now

Page 26: Development of the CyberCemetery (2011)

6. Link Checking• Manual:

• manually navigate original & archived sites• Automated:

• Xenu Link Checker• http://home.snafu.de/tilman/xenulink.html• compare reports of original & archived sites

7. Load to UNT Server• Upload archived website• Add navigation • Notify GPO (or agency/commission) that archived

version is live

Page 27: Development of the CyberCemetery (2011)
Page 28: Development of the CyberCemetery (2011)

• Backup• full backups to magnetic tape• performed each weekend• shipped to offsite storage company

• Iron Mountain • http://www.ironmountain.com

Page 29: Development of the CyberCemetery (2011)

• web files (HTML, XML)• text documents

(.txt, .pdf, .doc)• spreadsheets & statistics

(.xls)• presentations (.ppt)• media files:

• images & photographs (.jpg, .gif, .png, .tiff)

• audio (.mp3)• video (.wm, .mov, .rp)

Page 30: Development of the CyberCemetery (2011)

• researchers• historians• students• government employees• general public

• avg. +1,000,000 hits per month

• peak visits in one day:• 9,996 on 11.03.2011

• most popular site: 9/11 Commission

Page 31: Development of the CyberCemetery (2011)

• provides permanent public access• archive of “dead” government information• freely, globally available• 73 websites and growing

• partnership between:• University of North Texas Libraries• U.S. Government Printing Office• National Archives and Records Administration

Page 32: Development of the CyberCemetery (2011)

FOR FURTHER INFORMATION:

http://www.library.unt.edu/govinfo/ http://digital.library.unt.edu/explore/collections/GDCC/

Starr HoffmanHead, Government Documents Dept.University of North Texas [email protected]

[email protected] http://geekyartistlibrarian.com