it takes a village: co-developing vedaweb research ... vedaweb presentation.pdfآ  it takes a...

Download It Takes a Village: Co-developing VedaWeb Research ... Vedaweb presentation.pdfآ  It Takes a Village:

Post on 31-Oct-2019




0 download

Embed Size (px)


  • It Takes a Village: Co-developing VedaWeb, a Digital

    Research Platform for Old Indo-Aryan Texts Börge Kiss (IDH), Daniel Kölligan (HVS), Francisco Mondaca (CCeH), Claes Neuefeind (IDH), Uta Reinöhl

    (ASW), Patrick Sahle (CCeH) 05.03.2019

  • Research Goals

    Traditional research with large corpora

    - concordances / word indexes, lexica: make usage

    patterns and frequencies visible

    - determination of meanings, functions, syntactic

    patterns based on researchers' individual

    assessments and their "reading experience"

    - problems: rather intuitive, subjective; the more texts,

    the more intractable

  • Research Goals

    - online platform allowing combined searches of (1)

    lexical, (2) morphological, (3) metrical and (4) syntactic

    information, e.g.

    - (1): lexical fields: differences between words for x, e.g.

    'man/woman' [Kazzazi 2001]; 'light' [Roesler 1997] etc.

    - (2): use/distribution/functional difference of allomorphs:

    e.g. áśv-a- ʻhorseʼ, áśvās / áśvāsas ‘horses’

    - (

    - (3): position of forms in verse; word-shapes

    - (4): information structure (topic/focus)

  • Background


    - oldest text of Indo-Aryan, part of Indo-European

    language family, ca. 1300 / 1000 BC

    - ca. 160.000 words (in 1028 hymns grouped into 10

    books = "mandalas"); cf. Homer's Iliad + Odyssey =

    ca. 190.000 words

    - hymns to gods (Indra, Soma, Varuna, Mitra, …) recited

    mostly during Soma sacrifice (juice of intoxicating


    Further texts to be integrated: Atharvaveda (c. 170.000

    words), Yajurveda; Vedic prose: Aitareya Brahmana (c.

    100.000 words), Maitrayani Samhita (c. 120.000 words)

  • Background


    - morphology

    - annotation provided by Prof. G. Dunkel, Prof. P.

    Widmer et al., University of Zurich

    - metre

    - Prof. K. Ryan, University of Harvard

    - syntax

    - Prof. H. Hettrich (University of Würzburg), Dr. O.

    Hellwig (University of Düsseldorf);

    - Dr. U. Reinöhl (University of Cologne/Mainz) using

    GRAID (Grammatical Relations and Animacy in


  • Team


    Apl. Prof. Dr. Patrick Sahle, P.I.

    Francisco Mondaca, M.A.

    Jonathan Blumtritt, M.A.

    Martina Gödel, M.A.

    IDH - Spinfo

    Dr. Claes Neuefeind, P.I.

    Börge Kiss, M.A.


    PD Dr. Daniel Kölligan, P.I.

    Dr. Uta Reinöhl , P.I.

    Jakob Halfmann

    Natalie Korobzow

    Felix Rau, M.A.

  • Co-operation partners

    Prof. Dr. Paul Widmer, Universität Zürich Dr. Salvatore Scarlata, Universität Zürich Prof. Dr. Kevin Ryan, University of Harvard Dr. Dieter Gunkel, University of Richmond Prof. Dr. Laurent Romary, Inria/HU Berlin, TEI Prof. Dr. Nikolaus P. Himmelmann, Universität zu Köln

  • VedaWeb: A digital platform for working with Old Indic texts

     make available RV + translations + morphological glossings for view & export

     connecting all word-forms of the annotated RV with the corresponding lexical entries in Grassmann, Böhtlingk / Roth, Monier Williams and vice versa

     allowing combinatorial searches of lemmas, word- forms, morphological and metrical information via cascading search index

  • State of the Art

     revisions & additions of Zurich glossings

     development of data model and APIs for dictionaries (Francisco Mondaca)

     development of web application (Börge Kiss)

     integration of further resources

  • Morphological Glossings (Zurich)

  • Translations: German, English, French, Latin, Russian…

  • Workflow

  • TEI - Modelling

     Appropriate data model is of central importance for consistence, transfer, persistence and presentation

     TEI (Text Encoding Initiative) offers the best way for textual data to persist in time, due to its active community of scholars and a detailed documentation. It’s the de facto standard in Digital Humanities projects.

     modelling of texts (RV, translations) and dictionaries (Grassmann; Vedic Index of Names and Subjects)

  • Software Architecture

  • VedaWeb App


  • Cooperation within the project

     not traditional "chasm" between IT and humanities people, but rather different ranges of competences and overlapping responsibilities:

     "family constellation"

  • Cooperation within the project

     overlap of competence areas makes project feasible

     regular communication

     close feedback loops

     gitlab, issue tracking system

     regular team meetings (once a month)

  • simple and challenging issues

     different expectations of what is easy and difficult to implement, e.g.

     multiple, combinable full-text search

     search functions over diversely structured sets of data

     complex structure of the base text:

     books, hymns, verses, half-verses

     different counting systems (by books, by hymns)

     different text versions (editions; lemmas and annotations; "padapatha")

  • learning from each other

     for linguists:

     insights into opportunities provided by digital research platforms

     getting to know affordances of data for building an online platform and ensure data longevity (TEI)

     for technical researchers:

     complexity of ancient texts (internal structure, variation, different layers of form and meaning)

     interests of linguists and other humanities scholars in the data

     both:

     make one's terminology explicit and clear

     make the data consistent

  • improved collaboration

     general understanding

     for DH researchers:

     of the objects studied in various humanities disciplines and the relevant research questions and methods

     for humanities scholars:

     of the different fields and methods in DH (e.g. building a web- platform vs data modelling in TEI)

  • Future plans: next version

     metrical data (D. Gunkel/K. Ryan)

     audio & video:

     some recordings of A. Daniélou available

     complete recording of RV in Copenhagen - not really available


     texts: Atharvaveda, Maitrayani Samhita

     annotation layers / user accounts: GRAID etc.

     semantic search … (Semantic Web)

  • C-SALT : Cologne South Asian Languages and Texts

     overview of projects and digital resources related to South Asian languages, texts, and culture at the University of Cologne (TEI Sanskrit dictionaries, Pali dictionary…)

     C-SALT coordinates the activity of these projects and facilitates sustainable development of the diverse resources.

     further plans:

     Iranian (Avestan corpus + annotation; digital version of Bartholomae's dictionary; Middle Persian texts)

     Nuristani (A. Degener [Mainz]: Kalasha-Ala, Prasun)

  • धन्यवाद

    Thank you!