transforming a century into a corpus - uni-hamburg.de · literature • bekker, É. g. (1980)....

31
Transforming a century into a corpus Experiences with Central and Southern Selkup texts 04-Nov-2016 1 INEL Workshop 2016

Upload: others

Post on 07-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Transforming a century into a corpus - uni-hamburg.de · literature • Bekker, É. G. (1980). Sel’kupskie teksty. Skazki narodov Sibirskogo Severa [Teksty i per.] 3. T. I. Porotova

Transforming a century into a corpusExperiences with Central and Southern Selkup texts

04-N

ov-

2016

1

INEL

Wo

rksh

op

201

6

Page 2: Transforming a century into a corpus - uni-hamburg.de · literature • Bekker, É. G. (1980). Sel’kupskie teksty. Skazki narodov Sibirskogo Severa [Teksty i per.] 3. T. I. Porotova

Outline

• Introduction

• Selkup Spoken Language Corpus (SELSLC)

• Data

• texts

• speakers

• Workflow

• FLEx

• EXMARaLDA

• Problems

04-N

ov-

2016

2

INEL

Wo

rksh

op

201

6

Page 3: Transforming a century into a corpus - uni-hamburg.de · literature • Bekker, É. G. (1980). Sel’kupskie teksty. Skazki narodov Sibirskogo Severa [Teksty i per.] 3. T. I. Porotova

Introduction

• DFG-project: Syntactic description of Central and Southern Selkup dialects: a corpusbased analysis (WA 3153/3-1)

04-N

ov-

2016

INEL

Wo

rksh

op

201

6

3

Northern Selkup

Southern Selkup

Page 4: Transforming a century into a corpus - uni-hamburg.de · literature • Bekker, É. G. (1980). Sel’kupskie teksty. Skazki narodov Sibirskogo Severa [Teksty i per.] 3. T. I. Porotova

data: texts

• 104 texts from 35 different speakers

• all texts are previously published:

04-N

ov-

2016

INEL

Wo

rksh

op

201

6

4

• Bajdak / Maksimova (2002, 2013, 2015)

• Bajdak / Tuchkova (2004)• Bekker (1980)• Bykonia et al (1996)• Dul’zon (1966a, 1966b)• Grigorovskiy (1879)• Katz (1988)

• Kuz’mina (1967)

• Kuz’mina (1977)• Morev et al (1981)• Szabó (1966, 1967)• Tuchkova / Helimski (2010)

• Tuchkova / Wagner-Nagy (2015)

Page 5: Transforming a century into a corpus - uni-hamburg.de · literature • Bekker, É. G. (1980). Sel’kupskie teksty. Skazki narodov Sibirskogo Severa [Teksty i per.] 3. T. I. Porotova

data: dialect group of speaker

04-N

ov-

2016

INEL

Wo

rksh

op

201

6

5

Central Southern

Narym Middle Ob

Tym Upper Ob

Parabel Chaya

Vasyugan Lower Ket

Middle Ket

Upper Ket

Page 6: Transforming a century into a corpus - uni-hamburg.de · literature • Bekker, É. G. (1980). Sel’kupskie teksty. Skazki narodov Sibirskogo Severa [Teksty i per.] 3. T. I. Porotova

data: texts per dialect

• Central Selkup texts: 37

• Southern Selkup texts: 66

04-N

ov-

2016

INEL

Wo

rksh

op

201

6

6

Page 7: Transforming a century into a corpus - uni-hamburg.de · literature • Bekker, É. G. (1980). Sel’kupskie teksty. Skazki narodov Sibirskogo Severa [Teksty i per.] 3. T. I. Porotova

data: genre of the texts

04-N

ov-

2016

INEL

Wo

rksh

op

201

6

7

Page 8: Transforming a century into a corpus - uni-hamburg.de · literature • Bekker, É. G. (1980). Sel’kupskie teksty. Skazki narodov Sibirskogo Severa [Teksty i per.] 3. T. I. Porotova

data: date of birth

04-N

ov-

2016

INEL

Wo

rksh

op

201

6

8

Page 9: Transforming a century into a corpus - uni-hamburg.de · literature • Bekker, É. G. (1980). Sel’kupskie teksty. Skazki narodov Sibirskogo Severa [Teksty i per.] 3. T. I. Porotova

data: date of recording

04-N

ov-

2016

INEL

Wo

rksh

op

201

6

9

Page 10: Transforming a century into a corpus - uni-hamburg.de · literature • Bekker, É. G. (1980). Sel’kupskie teksty. Skazki narodov Sibirskogo Severa [Teksty i per.] 3. T. I. Porotova

data: age at recording

• average age: 56 Jahre

04-N

ov-

2016

INEL

Wo

rksh

op

201

6

10

Page 11: Transforming a century into a corpus - uni-hamburg.de · literature • Bekker, É. G. (1980). Sel’kupskie teksty. Skazki narodov Sibirskogo Severa [Teksty i per.] 3. T. I. Porotova

workflow

04-N

ov-

2016

INEL

Wo

rksh

op

201

6

11

Page 12: Transforming a century into a corpus - uni-hamburg.de · literature • Bekker, É. G. (1980). Sel’kupskie teksty. Skazki narodov Sibirskogo Severa [Teksty i per.] 3. T. I. Porotova

workflow: raw texts

04-N

ov-

2016

INEL

Wo

rksh

op

201

6

12

Page 13: Transforming a century into a corpus - uni-hamburg.de · literature • Bekker, É. G. (1980). Sel’kupskie teksty. Skazki narodov Sibirskogo Severa [Teksty i per.] 3. T. I. Porotova

workflow: raw texts

04-N

ov-

2016

INEL

Wo

rksh

op

201

6

13

Page 14: Transforming a century into a corpus - uni-hamburg.de · literature • Bekker, É. G. (1980). Sel’kupskie teksty. Skazki narodov Sibirskogo Severa [Teksty i per.] 3. T. I. Porotova

workflow: raw texts

04-N

ov-

2016

INEL

Wo

rksh

op

201

6

14

Page 15: Transforming a century into a corpus - uni-hamburg.de · literature • Bekker, É. G. (1980). Sel’kupskie teksty. Skazki narodov Sibirskogo Severa [Teksty i per.] 3. T. I. Porotova

04-N

ov-

2016

INEL

Wo

rksh

op

201

6

15

Page 16: Transforming a century into a corpus - uni-hamburg.de · literature • Bekker, É. G. (1980). Sel’kupskie teksty. Skazki narodov Sibirskogo Severa [Teksty i per.] 3. T. I. Porotova

workflow: transcribed texts

16

04-N

ov-

2016

INEL

Wo

rksh

op

201

6

Page 17: Transforming a century into a corpus - uni-hamburg.de · literature • Bekker, É. G. (1980). Sel’kupskie teksty. Skazki narodov Sibirskogo Severa [Teksty i per.] 3. T. I. Porotova

04-N

ov-

2016

INEL

Wo

rksh

op

201

6

17

Page 18: Transforming a century into a corpus - uni-hamburg.de · literature • Bekker, É. G. (1980). Sel’kupskie teksty. Skazki narodov Sibirskogo Severa [Teksty i per.] 3. T. I. Porotova

workflow: FLEx

04-N

ov-

2016

INEL

Wo

rksh

op

201

6

18

04-N

ov-

2016

INEL

Wo

rksh

op

201

6

Page 19: Transforming a century into a corpus - uni-hamburg.de · literature • Bekker, É. G. (1980). Sel’kupskie teksty. Skazki narodov Sibirskogo Severa [Teksty i per.] 3. T. I. Porotova

workflow: FLEx

04-N

ov-

2016

INEL

Wo

rksh

op

201

6

19

04-N

ov-

2016

INEL

Wo

rksh

op

201

6

Nov 16 Central Southern total

tokens 11.372 15.276 26.648

sentences1.729 (2.423) 2.470 (3.755) 4.199 (6.178)

71% 66% 68%

Page 20: Transforming a century into a corpus - uni-hamburg.de · literature • Bekker, É. G. (1980). Sel’kupskie teksty. Skazki narodov Sibirskogo Severa [Teksty i per.] 3. T. I. Porotova

04-N

ov-

2016

INEL

Wo

rksh

op

201

6

20

Page 21: Transforming a century into a corpus - uni-hamburg.de · literature • Bekker, É. G. (1980). Sel’kupskie teksty. Skazki narodov Sibirskogo Severa [Teksty i per.] 3. T. I. Porotova

workflow: EXMARaLDA

04-N

ov-

2016

INEL

Wo

rksh

op

201

6

21

Page 22: Transforming a century into a corpus - uni-hamburg.de · literature • Bekker, É. G. (1980). Sel’kupskie teksty. Skazki narodov Sibirskogo Severa [Teksty i per.] 3. T. I. Porotova

04-N

ov-

2016

INEL

Wo

rksh

op

201

6

22

Page 23: Transforming a century into a corpus - uni-hamburg.de · literature • Bekker, É. G. (1980). Sel’kupskie teksty. Skazki narodov Sibirskogo Severa [Teksty i per.] 3. T. I. Porotova

problems

• many different spellings of one and the same suffix / lexem (no orthographie)

• example: paldʼu ‘goʼ -> 18 allomorphes

palʼdö (1)palʼcʼe (1)paldzʼi (1)palʼdʼü (2)paldu (1)palʼde (1)

pald’ü (2)palʼdʼu (18)palʼdʼi (11)palʼdi (4)paldi (7)palʼdʼe (7)

(62) occurences in Southern dialects no occurence in Central

variantspalʼcʼ (1)palʼčʼə (1)palʼčʼu (1)palč (1)paldʼzʼi (1)palʼdʼzʼi (1)

➔ Alatalo 2004 (687) -> palc’u [kam.]

04-N

ov-

2016

INEL

Wo

rksh

op

201

6

23

Page 24: Transforming a century into a corpus - uni-hamburg.de · literature • Bekker, É. G. (1980). Sel’kupskie teksty. Skazki narodov Sibirskogo Severa [Teksty i per.] 3. T. I. Porotova

problems

• many different spellings of one and the same suffix / lexem (no orthographie)

• example: maːt ‘house, tent, at homeʼ -> 8 allomorphes

mаːt (15|7)mat (64|18)maːd (9|5)mad (6|6)matə (4)

(104|40) occurences in Southern dialects no occurence in Central

variants

madʼe (2)

maːn (1)

maːtə (2)matɨ (2)ma (2)mas (1)

24

04-N

ov-

2016

INEL

Wo

rksh

op

201

6

Page 25: Transforming a century into a corpus - uni-hamburg.de · literature • Bekker, É. G. (1980). Sel’kupskie teksty. Skazki narodov Sibirskogo Severa [Teksty i per.] 3. T. I. Porotova

problems

• example: suffix -sɨ (past marker) -> 26 allomorphes

-sɨ (8|1)

-saː (3|1)

-sa (110|11)

-se (4)

-sä (1)

-si (1)

-sə (4)

-s (181|1)

-ss (1)

-ssa (11)

-ssü (1)

-sse (1)

-ssi (3)

-ssaː (2)

-za (134)

-zaː (13)

-zə (3)

-ze (1)

-zɨ (28)

-z (1)

-zü (1)

-zo (1)

-zʼa (4)

-zʼaː (4)

-c (2)

-sʼa (2)

-sʼi (3)

(520|16) -> occurences in Southern|Central dialects

variants

-ša (1)

-ʒa (1)

-ʒe (1)

25

04-N

ov-

2016

INEL

Wo

rksh

op

201

6

Page 26: Transforming a century into a corpus - uni-hamburg.de · literature • Bekker, É. G. (1980). Sel’kupskie teksty. Skazki narodov Sibirskogo Severa [Teksty i per.] 3. T. I. Porotova

problems

• dictionary entries -> which entry is the “basic wordform”• example: paqqə ‘digʼ

paqqɨ (Ob.Sh)paqqɛːptɨ (Ob.Sh)paqqɛlgu (Ket)paqqəl (Ob.S, Ch, Ket)paqlešpu (Ob.Ch)paqəlešpu (Ob.Sh)paqɨl (Ob, Ket, Vas, Tym)paqəl (Ob.Ch, Ket)paqəlešpu (Ob.Sh)

paqqəl

Bykonia Kuper/Pusztay Helimski (Grigorovskij)

paqqyl

26

04-N

ov-

2016

INEL

Wo

rksh

op

201

6

Page 27: Transforming a century into a corpus - uni-hamburg.de · literature • Bekker, É. G. (1980). Sel’kupskie teksty. Skazki narodov Sibirskogo Severa [Teksty i per.] 3. T. I. Porotova

problems

• how do we solve it?

paqqɨ (Ob.Sh)paqqɛːptɨ (Ob.Sh)paqqɛlgu (Ket)

-ptɨ -> causative-l, -le -> inchoative-gu -> iterative, reflexive, DRV-špu -> imperfective verbal, habituative, durative

paqlešpu (Ob.Ch)paqəlešpu (Ob.Sh)paqɨl (Ob, Ket, Vas, Tym)

➔ which dialect determines the entry?➔ which spelling do we use?

27

04-N

ov-

2016

INEL

Wo

rksh

op

201

6

Page 28: Transforming a century into a corpus - uni-hamburg.de · literature • Bekker, É. G. (1980). Sel’kupskie teksty. Skazki narodov Sibirskogo Severa [Teksty i per.] 3. T. I. Porotova

literature

• Bykonia, V. V. (2005). Sel’kupsko-russkiy dialektnyj slovar’. Tomsk• Helimski, E. (2007). Yuzhnosel’kupskij slovar’ N. P. Grigorovskogo. Hamburg• Kuper, Sh. and J. Pusztay (1993). Sel’kupskiy razgovornik (narymskiy dialekt)• Bajdak, A. V. and N. j. P. Maksimova (2002). Didaktizacija original'nogo teksta.

Sel'kupskij jazyk (uchebno-metodicheskij komplekt k uvchebnomu modulju).

Tomsk

• Bajdak, A. V. et al. (2013). Sel’kupskie teksty. Sbornik annotirovannykh

fol’klornykh i bytovykh tekstov jazykov obsko-enisejskogo jazykovogo areala 3.

Tomsk. pp. 153-201

• Bajdak, A. V. et al. (2015). Sel’kupskie teksty. Sbornik annotirovannykh

fol’klornykh i bytovykh tekstov jazykov obsko-enisejskogo jazykovogo areala 4.

Tomsk. pp. 108-149.

• Bajdak, A. V. and N. A. Tuchkova (2004). Episody "Eposa ob It’t’e" v

chumil’kupskom dialektnom areale. Korennye narody Sibiri: problem

istoriografii, istorii, etnografii, lingvistiki. Tomsk. pp. 51-64.

28

04-N

ov-

2016

INEL

Wo

rksh

op

201

6

Page 29: Transforming a century into a corpus - uni-hamburg.de · literature • Bekker, É. G. (1980). Sel’kupskie teksty. Skazki narodov Sibirskogo Severa [Teksty i per.] 3. T. I. Porotova

literature

• Bekker, É. G. (1980). Sel’kupskie teksty. Skazki narodov Sibirskogo Severa

[Teksty i per.] 3. T. I. Porotova and A. P. Dul'zon. Tomsk. pp. 55-71.

• Bykonja, V. V., et al. (1996). Skazki narymskikh sel’kupov (kniga dlja

chtenija na sel’kupskom jazyke s perevodami na russkij jazyk) ; k 400-letiju

Naryma. Tomsk, AO Izdat. NTL.

• Dul'zon, A. P. (1966). Ketskie skazki. Tomsk, Izd. Univ.

• Dul’zon, A. P. (1966). “Sel’kupskie skazki”. Jazyki i toponimija Sibiri. Tomsk.

Izd. Tomskogo Univ. 97-158.

• Grigorovskij, N. P. (1879). Azbuka sjussogoj gulani: dlja innostrancev’

Narymskogo kraja. Kazan’.

• Kuz'mina, A. I. (1967). "Dialektologicheskie materialy po sel’kupskomu

jazyku." Issledovanija po jazyku i fol’kloru 2: 267-329.

• Kuz'mina, A. I. (1976). K etymologii nazvanij mesjacev, storon sveta, zvjozd

i sozvesdij v sel’kupskom jazyke. Jazyki i toponimija 4. Tomsk, Izd.

Tomskogo Universiteta. 71-86.

Page 30: Transforming a century into a corpus - uni-hamburg.de · literature • Bekker, É. G. (1980). Sel’kupskie teksty. Skazki narodov Sibirskogo Severa [Teksty i per.] 3. T. I. Porotova

literature

• Morev, J. A., et al. (1981). Sel’kupskie teksty. Skazki narodov Sibirskogo

Severa [Teksty i per.] 4. T. I. Porotova, A. P. Dul’zon and T. I. Porotova.

Tomsk, Izd-vo Tom. un-ta: 122-143.

• Szabó, L. (1966). 254-255

• Szabó, L. (1967). Selkup texts with phonetic introduction and vocabulary.

The Hague, Mouton.

• Tuchkоvа, N. j. A. e., et al. (2010). O materialah A. I. Kuz'minoj po

sel'kupskomu jazyku [=Über die selkupischen Sprachmaterialien von

Angelina I. Kuz'mina]. Hamburg [Institut für Finnougristik/Uralistik].

• Tuchkova, N. A. and B. Wagner-Nagy (2015). se ̄l’d’e nu ̄n qö ̄di i ̄t’t’e …

(Itja-teksty). Tomsk, Izdat. Tomsk. Gos. Ped. Univ.

Page 31: Transforming a century into a corpus - uni-hamburg.de · literature • Bekker, É. G. (1980). Sel’kupskie teksty. Skazki narodov Sibirskogo Severa [Teksty i per.] 3. T. I. Porotova

Thank you!

31

04-N

ov-

2016

INEL

Wo

rksh

op

201

6