web archiving overview - netpreserve.orgnetpreserve.org/ga2019/wp-content/uploads/2019/07/... ·...

Post on 11-Oct-2020

12 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

WEB ARCHIVING OVERVIEW

National and University Library - Slovenia

J a n ko K l a s i n c | j a n ko . k l a s i n c @ n u k . u n i - l j . s i | + 3 8 6 0 1 2 0 0 1 2 1 1

2002 – 2004

Slovenian electronic web publications collecting andarchiving methodology

2003 – 2004

Development and analysis of slovenian digitized andelectronic publications collection of nationalimportance

EARLY PROJECTS

2006

Legal deposit law(Zakon o obveznem izvodu publikacij (Ur. list RS, št. 69/06 in 86/09)

2007

Regulation on types and selection of electronicpublications for legal deposit(Pravilnik o vrstah in izboru elektronskih publikacij za obvezni izvod, (Ur. list RS, št. 90/07)

LEGAL BASIS

2008 -

Selective harvesting (1.400+ websites):

• government websites

• research & higer learning institutions

• on-line periodicals

• arts and culture institutions

• etc.

Themed crawls: parlimentary elections, local elections, important events (politics, sports etc.)

CRAWLING

2014 –

National domain .si crawl (biannually)

Heritrix 1.14.4. and 3.4

CRAWLING

2011 -

Wayback Machine

ACCESS

National domain, selective & thematic crawls:

• 560.066 domains

• 513.793.472 URLs

• 45,5 TB

Saff:

0,25 FTE?

DATA COLLECTED

• moving WCT, Heritrix & Wayback to new servers(separating crawling from access);

• focused crawl of 50 government domains before thecontent is moved to a single domain;

• providing access to the national domain crawls;

• rethinking legal basis for free access.

CURRENT ACTIVITIES

THANK YOU!

janko.klasinc@nuk.uni-lj.si

+386 01 2001 211

top related