possibilities of digital analysis of charter corpora

14
Possibilities of Digital Analysis of Charter corpora Georg Vogeler

Upload: georg-vogeler

Post on 17-Aug-2015

28 views

Category:

Science


1 download

TRANSCRIPT

Page 1: Possibilities of Digital Analysis of Charter corpora

Possibilities of Digital Analysis of Charter corpora

Georg Vogeler

Page 2: Possibilities of Digital Analysis of Charter corpora

IMC Leeds, 9.7.2009Georg Vogeler 2

Charter Corpora on the Web

Württembergisches Urkundenbuch (http://maja.bsz-bw.de/wubonline/

CDLM (http://cdlm.unipv.it) DEEDS (http://www.utoronto.ca/deeds/) Monasterium.net (http://www.monasterium.net) Ut per litteras apostolicas …

(http://www.brepolis.net) Diplomatico Firenze

(http://www.archiviodistato.firenze.it/diplomatico)

Page 3: Possibilities of Digital Analysis of Charter corpora

IMC Leeds, 9.7.2009Georg Vogeler 3

What’s their advantage?

Images Reconstructed archives

• Virtuelles Archiv Salzburg• Archive of the Stift Ardagger

Fast search => take the charter heritage as is

not as defined by organisational reasons

Page 4: Possibilities of Digital Analysis of Charter corpora

IMC Leeds, 9.7.2009Georg Vogeler 4

Online Corpus abolishes borders …

between repositories between forms of representation

and

Page 5: Possibilities of Digital Analysis of Charter corpora

IMC Leeds, 9.7.2009Georg Vogeler 5

Research on set phrases

Vernacular dating clauses• Latin model: (Ulm 1275 März 29)

dirre dinge iſt gezivch herre Marquart von Bleichen herre hartman von ſahſenhvſen vn herre tecke von annenhoven. Datum · IIIIo · kl · aprilis · anno dni · Mo · CCo · IXXVo.

• German model almost free from it:Diz geſchach zehahberch an deme Ciſtage in der phingeſtwochen / do von gotteſ geb#vrte waren zwelfhundert Sibenzig vn f #vnf Jar

Page 6: Possibilities of Digital Analysis of Charter corpora

IMC Leeds, 9.7.2009Georg Vogeler 6

Dating Clauses

13th century:• Germany (de Boor 1975)

- South-western model:• dis geschach do man zalte von gotes gebúrte

zwelf hundert und niun und niunzig jar.

- South-eastern model:• ditz ist geschehen, do es waren von christes

geburt tousent zwaihundert und darnach in dem niun unde niunzegisten jare.

Page 7: Possibilities of Digital Analysis of Charter corpora

IMC Leeds, 9.7.2009Georg Vogeler 7

In monasterium.net

for $u in //tenor[not(.='')]/ancestor::text[.//lang_MOM='Deutsch']let $dat := substring($u//tenor, (string-length($u//tenor) - 200))where number($u//date_sort) lt 14000001 and

number($u//date_sort) gt 13000000order by $u//date_sortreturn <dat><wo>{

$u/@b_name} {

$u//issued/placeName/text()}</wo><was> {

$dat}</was></dat>

Page 8: Possibilities of Digital Analysis of Charter corpora

IMC Leeds, 9.7.2009Georg Vogeler 8

In monasterium.net

[Dd][aov] ([uv][ao]n|nach) (([Gg]ot{1,2}[ei]{0,1}[sz])|(([Cc]h|[cCkK])rist[eisz]*?)|([uv]n{1,2}s[ei]{0,1}r[ei]{0,1}[sz] [hH]er{1,2}[ei]{0,1}n)) ([Gge]*?[PBpb][uv][oe°]{0,1}ri{0,1}[td][hte]*?) (w[ao]^{0,1}[rzs][ien]*?)

Page 9: Possibilities of Digital Analysis of Charter corpora

IMC Leeds, 9.7.2009Georg Vogeler 9

In monasterium.net

Page 10: Possibilities of Digital Analysis of Charter corpora

IMC Leeds, 9.7.2009Georg Vogeler 10

Results

13th century:• 433 texts• “zalt”-model: 24, all but 5 from the

Chartularium Sangallense• “waren”-model: 137, all but 15 from the south-

eastern regions 14th century:

• 8354 texts• “zalt”-model: 2478, 964 not from St. Gallen• “waren”-model: 350, only 13 from St. Gallen

Page 11: Possibilities of Digital Analysis of Charter corpora

IMC Leeds, 9.7.2009Georg Vogeler 11

Methods of Investigation

Already in use• Simple word selection/word count (Tock,

Brousseau, Parisse)• Phrase statistics (Gervers/Margolin)• Graphetic detail analysis (Fiebig)• Hand identification by pattern analysis

(Schomaker/Burgers)• Named entity recognition

(Stoyan/Schmidt)

Page 12: Possibilities of Digital Analysis of Charter corpora

IMC Leeds, 9.7.2009Georg Vogeler 12

Possible Programming

Testing/adapting existing algorithms• Author identification tools• Graphical variation tools• Named Entity Recognition methods for clauses

to find the connections between charters that aren’t kept in the same archive/aren’t printed in the same edition:• e.g.: Influence of recipient on the charters• Spread of formula, regions of legal culture

Page 13: Possibilities of Digital Analysis of Charter corpora

IMC Leeds, 9.7.2009Georg Vogeler 13

Early medieval diplomatics

Add charters to the online corpora Add information to the online charter

corpora

Take text analytic software into consideration

Ask your local computer scientist what he could help you

Page 14: Possibilities of Digital Analysis of Charter corpora

Thank you for your attention

[email protected]