possibilities of digital analysis of charter corpora

Post on 17-Aug-2015

28 Views

Category:

Science

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Possibilities of Digital Analysis of Charter corpora

Georg Vogeler

IMC Leeds, 9.7.2009Georg Vogeler 2

Charter Corpora on the Web

Württembergisches Urkundenbuch (http://maja.bsz-bw.de/wubonline/

CDLM (http://cdlm.unipv.it) DEEDS (http://www.utoronto.ca/deeds/) Monasterium.net (http://www.monasterium.net) Ut per litteras apostolicas …

(http://www.brepolis.net) Diplomatico Firenze

(http://www.archiviodistato.firenze.it/diplomatico)

IMC Leeds, 9.7.2009Georg Vogeler 3

What’s their advantage?

Images Reconstructed archives

• Virtuelles Archiv Salzburg• Archive of the Stift Ardagger

Fast search => take the charter heritage as is

not as defined by organisational reasons

IMC Leeds, 9.7.2009Georg Vogeler 4

Online Corpus abolishes borders …

between repositories between forms of representation

and

IMC Leeds, 9.7.2009Georg Vogeler 5

Research on set phrases

Vernacular dating clauses• Latin model: (Ulm 1275 März 29)

dirre dinge iſt gezivch herre Marquart von Bleichen herre hartman von ſahſenhvſen vn herre tecke von annenhoven. Datum · IIIIo · kl · aprilis · anno dni · Mo · CCo · IXXVo.

• German model almost free from it:Diz geſchach zehahberch an deme Ciſtage in der phingeſtwochen / do von gotteſ geb#vrte waren zwelfhundert Sibenzig vn f #vnf Jar

IMC Leeds, 9.7.2009Georg Vogeler 6

Dating Clauses

13th century:• Germany (de Boor 1975)

- South-western model:• dis geschach do man zalte von gotes gebúrte

zwelf hundert und niun und niunzig jar.

- South-eastern model:• ditz ist geschehen, do es waren von christes

geburt tousent zwaihundert und darnach in dem niun unde niunzegisten jare.

IMC Leeds, 9.7.2009Georg Vogeler 7

In monasterium.net

for $u in //tenor[not(.='')]/ancestor::text[.//lang_MOM='Deutsch']let $dat := substring($u//tenor, (string-length($u//tenor) - 200))where number($u//date_sort) lt 14000001 and

number($u//date_sort) gt 13000000order by $u//date_sortreturn <dat><wo>{

$u/@b_name} {

$u//issued/placeName/text()}</wo><was> {

$dat}</was></dat>

IMC Leeds, 9.7.2009Georg Vogeler 8

In monasterium.net

[Dd][aov] ([uv][ao]n|nach) (([Gg]ot{1,2}[ei]{0,1}[sz])|(([Cc]h|[cCkK])rist[eisz]*?)|([uv]n{1,2}s[ei]{0,1}r[ei]{0,1}[sz] [hH]er{1,2}[ei]{0,1}n)) ([Gge]*?[PBpb][uv][oe°]{0,1}ri{0,1}[td][hte]*?) (w[ao]^{0,1}[rzs][ien]*?)

IMC Leeds, 9.7.2009Georg Vogeler 9

In monasterium.net

IMC Leeds, 9.7.2009Georg Vogeler 10

Results

13th century:• 433 texts• “zalt”-model: 24, all but 5 from the

Chartularium Sangallense• “waren”-model: 137, all but 15 from the south-

eastern regions 14th century:

• 8354 texts• “zalt”-model: 2478, 964 not from St. Gallen• “waren”-model: 350, only 13 from St. Gallen

IMC Leeds, 9.7.2009Georg Vogeler 11

Methods of Investigation

Already in use• Simple word selection/word count (Tock,

Brousseau, Parisse)• Phrase statistics (Gervers/Margolin)• Graphetic detail analysis (Fiebig)• Hand identification by pattern analysis

(Schomaker/Burgers)• Named entity recognition

(Stoyan/Schmidt)

IMC Leeds, 9.7.2009Georg Vogeler 12

Possible Programming

Testing/adapting existing algorithms• Author identification tools• Graphical variation tools• Named Entity Recognition methods for clauses

to find the connections between charters that aren’t kept in the same archive/aren’t printed in the same edition:• e.g.: Influence of recipient on the charters• Spread of formula, regions of legal culture

IMC Leeds, 9.7.2009Georg Vogeler 13

Early medieval diplomatics

Add charters to the online corpora Add information to the online charter

corpora

Take text analytic software into consideration

Ask your local computer scientist what he could help you

Thank you for your attention

g.vogeler@lmu.de

top related