pionier network digital libraries federation experiences of a large scale metadata aggregator

21
Cezary Mazurek ([email protected]) Marcin Werla ([email protected]) Poznań Supercomputing and Networking Center (Poznań, Poland) 2009-09-30 ECDL 2009, Corfu, Greece

Upload: palmer-malone

Post on 01-Jan-2016

27 views

Category:

Documents


0 download

DESCRIPTION

Cezary Mazurek ([email protected]) Marcin Werla ([email protected]) Poznań Supercomputing and Networking Center (Poznań, Poland). PIONIER Network Digital Libraries Federation Experiences of a large scale metadata aggregator. Polish Optical Internet PIONIER. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: PIONIER Network  Digital Libraries Federation Experiences of a large scale metadata aggregator

Cezary Mazurek ([email protected])Marcin Werla ([email protected])Poznań Supercomputing and Networking Center (Poznań, Poland)

2009-09-30 ECDL 2009, Corfu, Greece

Page 2: PIONIER Network  Digital Libraries Federation Experiences of a large scale metadata aggregator

2009-09-30 ECDL 2009, Corfu, Greece

Page 3: PIONIER Network  Digital Libraries Federation Experiences of a large scale metadata aggregator

Main organizational models Regional digital libraries▪ Created and maintained by several institutions from

particular region▪ Gather mostly resources related to the region, its history

and culture but also academic educational materials and national cultural heritage

Institutional digital libraries▪ Created and maintained by single institutions (like

universities)▪ Gather mostly resources related to present activities (like

institutional repositories) and history of the institution In many cases the technical base and support

for digital libraries is provided by local computing or networking centres (like PSNC)

2009-09-30 ECDL 2009, Corfu, Greece

Page 4: PIONIER Network  Digital Libraries Federation Experiences of a large scale metadata aggregator

Regional digital libraries

Institutional digital libraries

Overall number of digital objects

285 thousands

Number of active digital libraries:

19 regional 21 institutional

Number of cooperatinginstitutions: Several hundreds of

libraries, museums and archives

+ several other digital libraries in the phase of planning, configurationor initial content uploading

2009-09-30 ECDL 2009, Corfu, Greece

Page 5: PIONIER Network  Digital Libraries Federation Experiences of a large scale metadata aggregator

Main aims To facilitate the use of resources

from Polish digital libraries To increase the visibility of these

resources in the Internet To create new, advanced network

services both for end-users and digital libraries creators on the base of these resources

2009-09-30 ECDL 2009, Corfu, Greece

Page 6: PIONIER Network  Digital Libraries Federation Experiences of a large scale metadata aggregator

Basic assumptions No need nor requirement to move

resources to the DLF No fees for the use of the DLF and for

being a part of it Open standards are the basis for

cooperation▪ Particular digital libraries can use different

technological platforms

2009-09-30 ECDL 2009, Corfu, Greece

Page 7: PIONIER Network  Digital Libraries Federation Experiences of a large scale metadata aggregator

Basic functions Search in the available publications▪ Simple▪ Advanced

Digitization plans▪ Searchable▪ Report▪ API for the prevention of duplicted digitization

Location of digital objects on the basis of their OAI Identifiers

Database of Polish digital libraries Statistics and reports

Information in the DLF is updated on the daily (nightly) basis

2009-09-30 ECDL 2009, Corfu, Greece

Page 8: PIONIER Network  Digital Libraries Federation Experiences of a large scale metadata aggregator

See it:http://fbc.pionier.net.pl/

2009-09-30 ECDL 2009, Corfu, Greece

Page 9: PIONIER Network  Digital Libraries Federation Experiences of a large scale metadata aggregator

ECDL 2009, Corfu, Greece

Digital Libraries Federation

search plugin

2009-09-30

Page 10: PIONIER Network  Digital Libraries Federation Experiences of a large scale metadata aggregator

InstitutionsDigital librariesMetadata aggregator

2009-09-30 ECDL 2009, Corfu, Greece

Page 11: PIONIER Network  Digital Libraries Federation Experiences of a large scale metadata aggregator

We gather the information about content providers and their information systems

Database of Polish Digital Libraries in the DLF

2009-09-30 ECDL 2009, Corfu, Greece

Page 12: PIONIER Network  Digital Libraries Federation Experiences of a large scale metadata aggregator

We gather the metadata of objects that should be visible in Europeana

Done with the OAI-PMH▪ In most cases we require the OAI-PMH

interface▪ In really special cases we can do it in

different way (eg. Polish Internet Library) Now we harvest only Dublin Core Simple

▪ Works on new national metadata schema started in September 2009

▪ Approximate time of development: 3 months▪ Approximate time of deployment: ???

2009-09-30 ECDL 2009, Corfu, Greece

Page 13: PIONIER Network  Digital Libraries Federation Experiences of a large scale metadata aggregator

We will try to clean-up the metadata, normalize it and enrich On the DLF level there are automatically built

dictionaries on the basis of aggregated metadata▪ Separately for each metadata element▪ Separately for each metadata language

Differences between the metadata from various digital libraries have negative impact for the searching possibilities of the end-users

That is why the metadata normalization is so important

The basic analysis shows which elements are crucial and which should be easy to clean-up▪ The analysis was done in April 2009 on the metadata of

214 254 aggregated objects

2009-09-30 ECDL 2009, Corfu, Greece

Page 14: PIONIER Network  Digital Libraries Federation Experiences of a large scale metadata aggregator

DC ElementNumber of

unique valuesHow many times values were used in metadata

Average number of uses per one

valueformat 39 209 789 5 379,2

language 195 210 529 1 079,6 type 822 211 816 257,7

rights 1 192 246 093 206,5 coverage 66 2 390 36,2 publisher 18 002 310 764 17,3

contributor 12 979 83 464 6,4 subject 78 440 438 871 5,6 relation 9 292 48 319 5,2

date 47 581 209 589 4,4 identifier 6 426 27 666 4,3

description 43 657 180 391 4,1 source 16 996 52 506 3,1 creator 21 908 67 503 3,1

title 210 745 227 039 1,1 2009-09-30 ECDL 2009, Corfu, Greece

Page 15: PIONIER Network  Digital Libraries Federation Experiences of a large scale metadata aggregator

Format In 99% of descriptions: MIME type(eg.

text/html, image/x.djvu) Language

In most cases: ISO 639-2 (pol, ger, lat, fre etc.)

Sometimes one value „pol, ger” instead of „pol”, „ger”

Rights Name of the institution which holds the

original object Type

…2009-09-30 ECDL 2009, Corfu, Greece

Page 16: PIONIER Network  Digital Libraries Federation Experiences of a large scale metadata aggregator

Values for „Type” (top 20)Number of objects

with the value% of aggregated

objects% of aggr. obj. (after clean-up)

czasopismo 44 709

20,9%33,8%

gazeta 32 921

15,4%31,3%

gazety 23 119

10,8%

Czasopismo 20 965

9,8%

książka 12 503

5,8%

Gazeta 11 098

5,2%

pocztówka 5 768

2,7%

czasopisma 4 962

2,3%

text 4 452

2,1%

grafika 3 863

1,8%

fotografia 3 596

1,7%

artykuł z czasopisma 3 164

1,5%2,6%

artykuł 2 455

1,1%

Czasopisma 1 710

0,8%

dzienniki urzędowe 1 516

0,7%

stary druk 1 222

0,6%1,1%

starodruk 1 221

0,6%

rysunek 1 094

0,5%

rękopis 1 062

0,5%

mapa 1 028

0,5%

Sum 85,1% 68,9%

2009-09-30 ECDL 2009, Corfu, Greece

Page 17: PIONIER Network  Digital Libraries Federation Experiences of a large scale metadata aggregator

DC ElementNumber of

unique valuesHow many times values were used in metadata

Average number of uses per one

valueformat 39 209 789 5 379,2

language 195 210 529 1 079,6 type 822 211 816 257,7

rights 1 192 246 093 206,5 coverage 66 2 390 36,2 publisher 18 002 310 764 17,3

contributor 12 979 83 464 6,4 subject 78 440 438 871 5,6 relation 9 292 48 319 5,2

date 47 581 209 589 4,4 identifier 6 426 27 666 4,3

description 43 657 180 391 4,1 source 16 996 52 506 3,1 creator 21 908 67 503 3,1

title 210 745 227 039 1,1 2009-09-30 ECDL 2009, Corfu, Greece

Page 18: PIONIER Network  Digital Libraries Federation Experiences of a large scale metadata aggregator

(Polish version of objects’ description)

ValueNo. of associations % of all associations

gazety regionalne 12214 2,56%czasopisma 7716 1,62%prasa polska 5424 1,14%czasopisma niemieckie 5009 1,05%gazety sublokalne 4968 1,04%Grodków 4962 1,04%Grottkau 4961 1,04%Wielkopolska 4422 0,93%19 w. 4249 0,89%Prusy 4164 0,87%Czasopisma regionalne i lokalne polskie -19 w. 4140 0,87%wiadomości polityczne 4094 0,86%Gazety polskie - 1918-1939 r. 4077 0,85%kultura 4071 0,85%czasopisma sublokalne 3813 0,80%Górny Śląsk 3731 0,78%architektura 3566 0,75%Wrocław 3515 0,74%Śląsk 3448 0,72%budownictwo 3388 0,71%

2009-09-27 ECDL 2009, Corfu, Greece

Confused with coverage:temporal spatial

Page 19: PIONIER Network  Digital Libraries Federation Experiences of a large scale metadata aggregator

(Polish version of objects’ description)

ValueNo. of associations % of all associations

Poznań 54943 12,62%Telecomp Service na zlecenie PBI 22310 5,12%Kraków 13662 3,14%Warszawa 11245 2,58%Toruń 11221 2,58%Katowice 8187 1,88%Drukarnia Polska 7998 1,84%Drukarnia Dziennika Poznańskiego T.A. 6828 1,57%Warszawa : Telecomp Service na zlecenie PBI 6824 1,57%Drukarnia Dziennika Poznańskiego S.A. 5785 1,33%Nakładem F[ranciszka] T[adeusza] Rakowicza 5406 1,24%Kielce 5292 1,22%Krakowskie Wydawnictwo Prasowe RSW "Prasa" 5137 1,18%Breslau 5130 1,18%E. Neugebauer 4959 1,14%Wangefield 4959 1,14%Grottkau 4959 1,14%Bydgoszcz 4752 1,09%Drukarnia Dziennika Poznańskiego 3923 0,90%Drukarnia J. I. Kraszewskiego 3869 0,89%

2009-09-27 ECDL 2009, Corfu, Greece

Geographical location…

Page 20: PIONIER Network  Digital Libraries Federation Experiences of a large scale metadata aggregator

We have over 40 digital libraries in Poland which are filled with content and metadata coming from hundreds of institutions from different domains

We harvest the metadata and provide a single point of access to it The PIONIER Network Digital Libraries Federation (

http://fbc.pionier.net.pl/) The software used for this service will be released as an open-

source by the end of this year Cooperation with Europeana (but not only this) requires

cleaning-up and normalization of metadata This is currently our biggest challenge

But we do not want to solve it only by technical means on the level of our aggregator

Close cooperation with content providers and some organizational changes prepared by them should effect in more efficient and sustainable metadata improvement process than a purely technical solution

2009-09-30 ECDL 2009, Corfu, Greece

Page 21: PIONIER Network  Digital Libraries Federation Experiences of a large scale metadata aggregator

Cezary Mazurek ([email protected])Marcin Werla ([email protected])Poznań Supercomputing and Networking Center (Poznań, Poland)

2009-09-30 ECDL 2009, Corfu, Greece

Thank you for your attention. Any questions?