local content in a europeana cloud alternative methods of ingestion for small institutions (stein)...

Post on 01-Apr-2015

216 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

local content in a Europeana cloud

Alternative methods of ingestion for small institutions

(Stein) Runar Bergheim Asplan Viak Internet as

LoCloud is funded by the European Commission's ICT Policy Support Programme

Overview of Presentation

• Characteristics of Europeana content providers

• Present ingestion methods for Europeana

• Alternative ingestion methods “out there”

• Experiments that may be conducted as part of LoCloud

• 7 slides• 284 words• 1 858 characters• 2 illustrations• (A seemingly endless

stream of words)

Characteristics of Europeana content providers

Those who are «in»• Professional cultural

heritage institutions• Capacity for investment in

infrastructure & projects• Technical skills beyond what

may be expected• Entities that fit into a

hierarchy of aggregators• Patient

Those who are «out»• Very small collections

– Collections by individuals– (tens to hundreds of objects)

• Independent institutions with strained funding

• «Non-conforming» online content structure– 1 web page 1 object

Present Europeana ingestion process

• Puts great demands on content providers– Partly mitigated by the excellent MINT-MORE tools

• Limited capacity at harvesting end– Partly mitigated by aggregator hierarchy

• Low frequency of updates – each iteration takes a long time– Partly mitigated by modified content/aggregation

architecture of Europeana Cloud

Weaknesses of presentEuropeana ingestion process

Alternativeingestion methods«out there»

Difficult to create complete ESE/EDM from crawling– But... the typical Europeana record is not really all that

«complete»– Schema.org. Microformats and other embedded

semantics may help• Deep-content URLs hidden for crawlers– Simple «site-map» protocol may be applied

• Increases capacity for small content providers• Decreases time-consumption of the content

ingestion life-cycle• Will serve more than one publishing channel

Considerations for alternativeingestion methods

• Content assessment– Assess quantity of «new» content that can be reached

using alternative ingestion methods• Technology experiments

– HTML embedded semantics based on open standards– Creating a test-spider for auto-extraction of metadata

from web pages– Transformation of data to ESE/EDM

• Design of processes– Embedding of spider into aggregator organizations

business processes– Ingestion + Quality assurance

Experiments that may be conducted as part of LoCloud

Thank you for the attentionrb@avinet.no

LoCloud is funded by the European Commission's ICT Policy Support Programme

The views and opinions expressed in this presentation are the sole responsibility of the

authors and do not necessarily reflect the views of the European Commission.

Funding

top related