computational humanities – bridging the gap between...

50
Computational Humanities – Bridging the Gap between Computer Science and Digital Humanities: Report on the Dagstuhl seminar Christopher G. Brown, Brian D. Joseph [email protected] [email protected]

Upload: hoanglien

Post on 15-Jun-2019

214 views

Category:

Documents


0 download

TRANSCRIPT

Computational Humanities – Bridging the Gap between Computer Science

and Digital Humanities: Report on the Dagstuhl seminar

Christopher G. Brown, Brian D. Joseph [email protected]

[email protected]

We would like to thank David Manderscheid, Executive Dean of Arts and Sciences, and Caroline Whitacre, Vice-President for Research, for their support, as that allowed us to attend the Seminar. In addition, we benefitted from assistance that Abhijit Varde and Mike Butsko of the Center for Language, Literature, and Culture provided with the design and planning for our Seminar presentations.

Steve Jobs

“I always thought of myself as a humanities person as a kid, but I liked electronics. Then I read something that one of my heroes, Edwin Land of Polaroid, said about the importance of people who could stand at the intersection of humanities and sciences, and I decided that’s what I wanted to do.”

• themes of the seminar • presentations of interest and relevance for the

Herodotos Project • Our Dagstuhl presentations:

– Digital work in linguistics – OSU digital humanities

• Research in the field of Digital Humanities, also known as Humanities Computing, has seen a steady increase over the past years. Situated at the intersection of computing science and the humanities, present efforts focus on making resources such as texts, images, musical pieces and other semiotic artifacts digitally available, searchable and analyzable.

• To this end, computational tools enabling textual search, visual analytics, data mining, statistics and natural language processing are harnessed to support the humanities researcher. The processing of large data sets with appropriate software opens up novel and fruitful approaches to questions in the traditional humanities.

• Furthermore, the computational paradigm transforms the humanities, since it opens the way to new research questions and more adequate methodologies for answering them, and since it becomes possible to analyze a much larger amount of data, yet in a quantitative and automated fashion.

Closing the gap • Despite the considerable increase in Digital Humanities

research, there is a perceived gap between the traditional humanities and computer science. Reasons for this are rooted in the current state of both fields:

• Whereas computer science excels at automating repetitive tasks with respect to low-level content processing, it can be difficult for computer scientists to fully appreciate the concerns and research goals of their humanities colleagues.

• For a humanist, in turn, it is often hard to imagine what computer technology can and cannot provide, how to interpret automatically generated results, and how to judge the advantages of automatic processing -- even if imperfect -- over manual analyses.

• The workshop will solidify Computational Humanities as a field of its own and identify the most promising directions for creating a common understanding about methodologies and goals. Importantly, the computer scientist cannot be reduced to a software engineer for the humanist, nor should the humanist be compelled to construct post-hoc explanations for results from automatic data analysis. Rather, both sides must agree on a common vision and define and exemplify accepted methodologies and measures for assessing the validity of research hypotheses. We conceive computational humanities as a discipline that provides this algorithmic foundation, as a bridge between computer science and the humanities.

• The new discipline is explicitly concerned with research questions from the humanities that can more successfully be solved through the use of computing, as well as with pertinent research questions from computing science focusing on multimedia content, uncertainties of digitization, language use across long time spans and visual presentation of content and form.

1. The Present State: What works, and what not?

Review of the success of 10 years of the digital humanities: Can we identify commonalities of successful projects? What kinds of results have been obtained? What kinds of results were particularly beneficial for partners in different areas of research? Can success in one field be transferred to other fields by following the same methodology? Review of the challenges of 10 years of the digital humanities: What are recurring barriers to efficient cross-disciplinary collaboration? What are the most common unexpected causes of delays in projects? What are common misunderstandings? What is the current role of computer scientists and researchers in the humanities in common projects, and how do these groups envision and define their roles in this interplay?

2. Computational Challenges in Computational Humanities

• How can the success of a computer system for humanities data-processing be subject to an evaluation to quantify its improvement?

• What are the challenges posed by the demands from

the humanities? In particular, how can computer scientists convey the notion of uncertainties and processing errors to researchers in the humanities?

• What research questions arise for computational

scientists when processing data from the humanities?

3. Humanities Challenges in Computational Humanities

How can we falsify hypotheses with data processing support? What research questions can be appropriately addressed with computational means? What is and is not acceptable in terms of methodology when one relies on automatic data processing steps?

4. Algorithmic Foundations of Computational Humanities

Can we agree on generic statements about the expressivity of the range of algorithms that are operative in the digital humanities and related fields of research? Can we distinguish complexity levels of algorithms in computational humanities that are distinguished by their conditions of application, by their expressiveness or even explanatory power? Which conditions influence the humanities’ notion of interpretability of the output generated by these algorithms?

• Humanists need to become more tech-savvy and learn how to talk with computer scientists

• Computer scientists do not want to be used as mere software programmers; a collaborative relationship implies dialogue

• the value and limits of quantification • Humanists don't know much about living in a

world where quantitative data matters

Digital humanities

• Idiographic vs. nomothetic, event vs. law disciplines

• Does this distinction make sense in the age of big data?

• An interpretation of a novel is about what makes that novel unique

• Humanities are dialogical • Scientists often see humanities research in terms

of quantifiability

Szymon Rusinkiewicz

Princeton University, Computer Graphics Group Reconstruction of 15th c. BC frescoes from Thera (Greece): Using 3D laser scanners to map edges of fragments of frescoes to be able to search for matching edges (“joins”)

Maximilian Schich • (Art Historian, cultural

historian, network analyst), University of Texas at Dallas

• Recent publication in Science on plotting birth- and death-place of “notable individuals” (intellectuals, artists, leaders) over 2000 years by way of seeing patterns of interactions between culturally relevant locations

Susan Schreibman, Maynooth (Ireland)

Digital Project collecting (for digitization) letters from 1916 (year of the Easter Rebellion in Ireland), involving the public in the collection and curation of materials

Meinard Müller

• Computer graphics, Max Planck Institute for Informatik

• Electrical Engineer, computer scientist

• Works on music as data, taking account of all the different representations of music (acoustic images, sheet music, electronic bits, etc.)

Computational analysis of narratives

• Annette Frank: alignment across comparable narratives; Corpus based learning of semantic concepts

• David Bamman: Bayesian mixed effects model of literary character

Manfred Thaller vs. Greg Crane

• STEM deprioritizing the humanities • Humanities—making the best of bad

information • Humanist education implies: critical

engagement with the past; curiosity about other cultures and perspectives

• Stop focusing on virtuoso performance and focus on accessibility

• Changing scales of research—macro analysis, and new forms of micro analysis

• Changing communities of scholarship—citizen and student science

• New ecologies for intellectual exchange • Taking risks—the J Curve—drop before

improvement • B. Berendt—but financial incentives favor

applied research

Back to the 18th century • No radical distinction between humanities and

digital sciences seems possible—the questions bring us together

• radically new and deeply traditional possibilities for education: if you are not producing new knowledge you are not in a university (Humboldt)

• Instructor and student should both be in the service of Wissenschaft

• German universities don’t separate the humanities and sciences to the same degree as in the US

• Support original research by undergraduates • Tree banks for Greek and Latin • Reemergence of editing as a primary activity • Commented edition and translation as an

undergraduate thesis • Well annotated corpora • Most Classics faculty think undergraduates

cannot do research

http://www.homermultitext.org/

“the most important classicist in the world right now”

Thaller

• This is a valuable side effect of Digital Humanities but should not drive them

• Need for a vision for the humanities (the old nationalist motives no longer apply)

• Scientific theories not proved false, their proponents die out (Thomas Kuhn)

• University trains people to solve questions that have not yet arisen

Transatlantic cooperation

• Germany is more advanced in this area • German govt funding for humanities research

ca. 40% of the total budget; NEH budget a rounding error of that of NSF/NIH ($150 million vs. $7 billion)

• Transatlantic collaboration becoming necessary

• Crane at Leipzig • OSU can only benefit from such relationships

Dagstuhl Working Group on Literature and Lexicon

• Fotis Jannidis, • David Mimno, text mining, topic modeling, novels

from Matt Jocker • Loretta Auvil, HATHI Trust • Alexander Mehler—Frankfurt, colleague of Jost

Gippert, founder of the TITUS project • David Bamman—Carnegie Mellon • Kurt Gärtner • David Smith Tufts • Greg Crane Perseus, Humboldt Professory Leipzig

A corpus-based history of German novels

Fotis Jannidis (Würzburg, Germany)

Research goals • history of narrative techniques, for example

– usage of speech rendering – focalisation / perspective – characterisation – plot structure

• style and genre • topics and topoi

– distribution of topics – distribution of topoi – correlation with genre and other factors

The Corpus (1500-1930)

• Core Corpus: 450 novels with fine-grained metadata and high text quality (selection bias: canonization)

• Extension Corpus: 1300 novels, basic metadata and very varying text quality (selection bias: somehow known/accessible to modern readers)

• In preparation: 500 novels randomly selected • tbd: Hathi-Trust / Google-Books

Methods (the usual suspects) • computational stylistics • topic modeling • machine learning on manually annotated texts • tools: R, Python, Mallet; in future DKPro/UIMA

• narratology • literary interpretation • theory and practice of literary history • genre analysis

and literary genre

-1740

1740-1812 1807-

1889

1863-1928

Challenges

• again and again: German (language, nlp tools, Fraktur etc.)

• extraction of concepts relevant for literary studies, for example – plot – character – genre – main theme – motif

Davids

• Smith • Bamman • Mimno