clariah meeting 2013 09-11 odijk 2013-09-11
DESCRIPTION
Presntatie van Jan Odijk over de CLARIAH 2013 aanvraagTRANSCRIPT
Common Lab Research Infrastructure for the Arts and Humanities
CLARIAH 2013 ProposalJan Odijk
Trippenhuis, Amsterdam 2013-09-11
2
Overview
• What• Why• Focus• Governance• Main Activities + Budget• Applicants• Centres• Sustainability• Schedule• Support
3
What I
• NL part of the CLARIN + DARIAH infrastructures
• A research infrastructure in which a humanities researcher– Can find all data relevant for the research– Can find all tools relevant for the research– Can apply the tools to the data without any technical
background or ad-hoc adaptations• intelligent search in and through the data• Information extraction, analysis, aggregation,
visualisation, validation, conversion, enrichment, annotation, …
– Can store data resulting from the research– Can store tools resulting from the research
5
What II
• Virtual and distributed • Based on one or more centres per country
6
Why I
• Enormous increase of available data • Data are ‘rich’– complex, fuzzy, ambiguous,
heterogeneous, have a time dimension• Data are digital– (Advanced) digital tools can be used to support
humanities research– they must be used to cope with the quantity
7
Why II
• big opportunity to bring humanities research to a new level– Empirical basis increased by orders of
magnitude– Information hidden in these data can be
disclosed and analysed– Will enable new research questions– Existing research questions can be addressed
in new ways– Quality, effectiveness and efficiency increase– potential for ground-breaking research
8
Focus
• 3 humanities disciplines– Language studies– Media studies– Socio-economic history
9
Focus
• Why these 3?– Language studies core of CLARIN– Socio-economic studies core of DARIAH– All are forerunners in the use of digital data
and tools– Their dominant data types cover the whole
spectrum:• Language studies: text• Media studies: audio-visual data • Socio-economic history: structured data
10
Focus
• Media studies: – text as carrier of cultural content / information
v. – as object of inquiry in language studies– also an important aspect of CLARIN– crucial for other humanities disciplines
• Cross-fertilization– NLP techniques enable extraction of
information from texts for storage in structured databases (e.g. socio-economic data)
– Speech recognition + image recognition enable advanced indexing of audio-visual material
• Solid foundation for future extension to other disciplines
11
Focus
• For each discipline a core team
13
Language studies Media Studies Socio-economic history
Researcher Sjef Barbiers (Meertens / UU)
José van Dijck / Julia Noordegraaf (UvA)
Jan Luiten van Zanden (UU)
ICT researcher
Antal van den Bosch (RUN)
Maarten de Rijke / Cees Snoek (UvA)
Frank van Harmelen (VU)
Data Centre The Language Archive (MPI+)
NISV IISH
Governance
• Main features– Based on CLARIN-NL governance– Consortium will be formed + consortium
agreement– Executive Board ‘lean and mean’ team• General, technical aspects, user aspects,
dissemination, outreach, education, training– Overview Board (Raad van Toezicht)– National Advisory Panel– International Advisory Panel
14
Main Activities
• Technical Implementation including (continuation of) Centre set-up
• Interoperability: concrete implementations to realize and test interoperability– Formal and semantic interoperability
– Metadata, data, and software
– Linking publications to resources (enhanced publications)
– Compatible with CLARIN and DARIAH
• Intelligent Search: concrete implementations of searching for, in and through data
• Enrichment/annotation, information extraction, analysis, aggregation and visualisation software
15
Main Activities
• Data Curation• Software Curation + demonstrators• Research Pilots: test in a small research project
whether CLARIAH-functionality indeed supports the research
• Education & Training• Dissemination & Outreach• Management • EU-oriented activities
– E.g. cooperation projects with other countries
• Budget: app. 18 m€ (but still has to be finalized)
16
Applicants
• Small number– Required by template– Recommended by experts
• 2011 ‘penvoerder’ was UU, now KNAW institute• Applicants:
– KNAW institute: Lex Heerma van Voss – Intended director: Jan Odijk– 3 Humanities researchers
• Sjef Barbiers • José van Dijck• Jan Luiten van Zanden
• Others involved sign “Letter of Intent” to participate in CLARIAH and become consortium member
17
Centres
• The Language Archive (TLA, MPI+)• Netherlands Institute for Sound and Vision
(NISV) • International Institute for Social History (IISH)• Data Archiving and Networked Services (DANS) • Huygens Institute• Institute for Dutch Lexicology (INL)• Meertens Institute• National Library (KB)• University Libraries• …
18
Sustainability
• What after the CLARIAH project?– Centres provide data / services independently
of CLARIAH (before, during and after)– Concrete commitment by KNAW of 0.5M euro /
year for 5 years after CLARIAH to maintain the infrastructure
– We have to organize ourselves to be able to run the services as efficiently as possible
– For software sustainability close collaboration with NL eScience Centre, cf. recent start of ‘Alliance for Software Sustainability’ (DANS and NL eScience initiative)
19
Date Action
1 Oct 2013 Submission Deadline
Oct-Dec 2013 Consultation of referents; rebuttal submitters; recommendations NWO-gebiedsbesturen.
Jan 2014 First Meeting committee
Mar/Apr 2014 Site visitsSecond Meeting Committee
Begin May 2014 Committee Recommendation to NWO AB
End May 2014 NWO AB Decision
End May/Begin June 2014
AB decision to Minister
Mid 2014 Minister informs Tweede Kamer; NWO informs submitters
Jan 1, 2015 (if awarded) Start of CLARIAH
Schedule
20
Support I
• Support by public and private organisations–Many of the data and technologies used in
CLARIAH are directly relevant for public organisations and companies, e.g.• Intelligent information extraction from a
heterogeneous set of ‘rich’ data• Either as a customer or as a developer of
such technology
21
Support II
– Close involvement of IBM from the start
– Support by many public institutes and companies
• Both for:
–CLARIN-NL and the
–2011 CLARIAH proposal
22
Support II
23
Support CLARIAH again!
Follow the good example of
24
26
CLARIAH: Industrial Interest
27