1 report from mpi team roman skiba peter wittenburg dobes workshop frankfurt april 2003

24
1 Report from MPI Team Roman Skiba Peter Wittenburg DOBES Workshop Frankfurt April 2003

Post on 19-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Report from MPI Team Roman Skiba Peter Wittenburg DOBES Workshop Frankfurt April 2003

1

Report from MPI Team

Roman SkibaPeter Wittenburg

DOBES WorkshopFrankfurtApril 2003

Page 2: 1 Report from MPI Team Roman Skiba Peter Wittenburg DOBES Workshop Frankfurt April 2003

2

Data types

• Tapes• Audio, Video (DV-PAL, DV-NTSC, VHS, DAT, MD)• other material: 8mm movies, reel to reel audio, slides, photos• DMFs (mpeg1, mpeg2, wav)• Metadata (IMDI-sessions, IMDI-corpusstructures)• Session media

• mpeg1, wav - for further processing • mpeg2 – for archiving• Html – as a container for text pictures and photos (jpeg)• PDF – as a container for text pictures and photos (jpeg)

• Info files (pdf, txt, html)• Annotations (EAF, shoebox)

DOBES WorkshopFrankfurtApril 2003

Page 3: 1 Report from MPI Team Roman Skiba Peter Wittenburg DOBES Workshop Frankfurt April 2003

3

Statistics

Raw data: tapes, DMFs and other media.

DOBES WorkshopFrankfurtApril 2003

Project/Language Tapes DMFs Other data digitized, converted or delivered

AWETI 71 70 - CHACO 17 17 - KUIKURO 36 37 Slides LACANDON 22 22 - SA-MN 17 (+?) 17 (+?) VCDs SVAN 4 4 - TSOVA-TUSH 5 5 - UDI 2 2 - TEOP 25 25 Lexicon, grammar TOFA 42 42 PDF TRUMAI 88 90 "Reel to reel", 8mm movies, slides, grammar WAIMA 24 24 - WICHITA 7 7 Lexicon examples Total 360 372

Page 4: 1 Report from MPI Team Roman Skiba Peter Wittenburg DOBES Workshop Frankfurt April 2003

4

Statistics II

Corpus units: meta data, media files, annotations .

DOBES WorkshopFrankfurtApril 2003

Project/Language IMDI-files sessions Integrated

imdi

Integrated

annotations

AWETI 35 (+?) 42 (+36) 35

KUIKURO 73 73 3 1

LACANDON 87 67 67

TEOP 31 30 11

TOFA 169 82 43 14

TRUMAI 189 181 187 34

TSOVA-TUSH 7 0 0

WICHITA 2 2 0

Total 593 513 346 49

Page 5: 1 Report from MPI Team Roman Skiba Peter Wittenburg DOBES Workshop Frankfurt April 2003

5DOBES WorkshopFrankfurtApril 2003

Page 6: 1 Report from MPI Team Roman Skiba Peter Wittenburg DOBES Workshop Frankfurt April 2003

6

Digitizing problems

• Recording problems

• due to non-continuous time code• due to long play mode• due to stills between moving pictures (!)

• Communication problems

• Maarten handles all comm with great care • Money problems (due to budget cuts we have to be more careful with expenses - less copying etc)

DOBES WorkshopFrankfurtApril 2003

Page 7: 1 Report from MPI Team Roman Skiba Peter Wittenburg DOBES Workshop Frankfurt April 2003

7

Audio/Video Archiving• many discussions with archivists in particular about audio (Austrian/German audio/phonogram archive, EMELD)

• point at LREC meeting: MP3 and ATRAC (Minidisc) are not ideal, but are acceptable for listening to and normal analysis of speech (discussed type of reduction and effects)

• attitude now: • any MD/MP3 file is reformatted to PCM in the archive• strong recommendation to researchers to use 16 bit linear PCM HF• get best quality you can - new devices such as DENON• what is slightly higher costs for equipment in relation to total budget • miniaturization can be a problem

• DENON Recorder• 192 MB flash cards (or even more)• linear PCM 768 kbps stereo = 16 min / mono = 32 min• MP3 (MPEG2 layer 2) 64 kbps: factor 12 => mono ~ 6 h

DOBES WorkshopFrankfurtApril 2003

Page 8: 1 Report from MPI Team Roman Skiba Peter Wittenburg DOBES Workshop Frankfurt April 2003

8

Video Digitization in the Field• audio no problem • video digitization at MPI was and is a success story • but slow cycle time - therefore digitization in the field

DV-Camera

DV-encoding3.4 MB/sec1h = 20 GBproprietary

limited sw support

MPEG1-encoding1.5 Mbps 1h = 1GB

to work with

MPEG2 copy (~6 Mbps) MPEG1 copy (~1 Mbps)MPEG4 copy (0.5 - …)etc

• MPEG2 widely accepted archive standard, various frontend codecs • still compressed - new standard will come in future • need your tapes (copies) and the MD file to create MPEG2 versions• use camera in continuous mode !!!! then batch segmentation • adapted workflows necessary

I-link

good old mail

conversionTsunami

tests withMPEG-Camera not ok

DOBES WorkshopFrankfurtApril 2003

Page 9: 1 Report from MPI Team Roman Skiba Peter Wittenburg DOBES Workshop Frankfurt April 2003

9

Access to Archiveshort-term

DOBES WorkshopFrankfurtApril 2003

Page 10: 1 Report from MPI Team Roman Skiba Peter Wittenburg DOBES Workshop Frankfurt April 2003

10

Access to the DoBeS archive I

Current state

• Digital data transport via

• Mail (DMF, session media)• FTP (all data) with password and User ID• Email (metadata, annotations, infos)• IMDI Browser (metadata, infos)

DOBES WorkshopFrankfurtApril 2003

Page 11: 1 Report from MPI Team Roman Skiba Peter Wittenburg DOBES Workshop Frankfurt April 2003

11

Access to the DoBeS archive II

Testing new ways

• Digital data transport via

• IMDI Browser (all integrated data types) password and User ID

• HTML corpus (all data types) password and User ID

• Remote access

DOBES WorkshopFrankfurtApril 2003

Page 12: 1 Report from MPI Team Roman Skiba Peter Wittenburg DOBES Workshop Frankfurt April 2003

12

Access to the DoBeS archive III

Future scenario

• Short term solution

• To open all data types of a team for the IMDI Browser (media, annotations etc.)

• Long term solution• File access (user IDs and passwords) administrated by the teams

DOBES WorkshopFrankfurtApril 2003

Page 13: 1 Report from MPI Team Roman Skiba Peter Wittenburg DOBES Workshop Frankfurt April 2003

13

Access to Archivelong-term

DOBES WorkshopFrankfurtApril 2003

Page 14: 1 Report from MPI Team Roman Skiba Peter Wittenburg DOBES Workshop Frankfurt April 2003

14

Archive Access Single Personthe single person solution - the (almost) ideal world

all in one single personal box

DOBES WorkshopFrankfurtApril 2003

Page 15: 1 Report from MPI Team Roman Skiba Peter Wittenburg DOBES Workshop Frankfurt April 2003

15

Archive Access Single Institutethe single institute solution - the (almost) ideal world

all in one single big box for an institute

DOBES WorkshopFrankfurtApril 2003

little more tricky - not all may access everything but one controlling instance

fast networks available

Page 16: 1 Report from MPI Team Roman Skiba Peter Wittenburg DOBES Workshop Frankfurt April 2003

16

Archive Access SI+Webthe single institute solution with Internet Access

the (almost) ideal world

all in one single big box for all

DOBES WorkshopFrankfurtApril 2003

much more tricky - not all may access everything still one controlling instance

but can be faked and slow networks for video control delegation necessary

Page 17: 1 Report from MPI Team Roman Skiba Peter Wittenburg DOBES Workshop Frankfurt April 2003

17

Archive Access DOBES Goal

DOBES WorkshopFrankfurtApril 2003

even more tricky - not all may access everything and everywhere?several controlling instances - need trust mechanisms

control delegation even more necessary stability of paths???

AILLA

SOAS

DOBES

??

Page 18: 1 Report from MPI Team Roman Skiba Peter Wittenburg DOBES Workshop Frankfurt April 2003

18

DOBES Archive Access

DOBES WorkshopFrankfurtApril 2003

resource domainstreaming servers http servers

URID - ACLmapping

URID-Pathmapping

client

URIDPID URL+

resource

users &groups

check whether user is allowed to access res

managementclients

check on valid ticket

Page 19: 1 Report from MPI Team Roman Skiba Peter Wittenburg DOBES Workshop Frankfurt April 2003

19

DOBES Archive Access

essentials

• online archive managers have write (delete) access (consistency, otherwise complex check-in & versioning system)

• question: who has read access rights?• researchers/archivist define access policy - incl. management???

• access per usage request (temporary) or per person/group?• do we need person groups (team members, researchers, community members, …)?• access patterns per infotyp (MD, video, audio, annotations, others)

• as was stated - everyone has to accept CoC and copyright statement!• what about logo and watermarking?

DOBES WorkshopFrankfurtApril 2003

Page 20: 1 Report from MPI Team Roman Skiba Peter Wittenburg DOBES Workshop Frankfurt April 2003

20

Collaborationsof

DOBES Archivist

DOBES WorkshopFrankfurtApril 2003

Page 21: 1 Report from MPI Team Roman Skiba Peter Wittenburg DOBES Workshop Frankfurt April 2003

21

Collaborations I• DELAN (Digital Endangered Languages Archive Network) AILLA, DOBES, ELAR-SOAS, PARADISEC, … link to and support from UNESCO?

• joint web portal with links AILLA? general information, eNEWS Archiv• Electronic Newsletter DOBES• Electronic Preprint Server LL?• Advice+FAQ AILLA?• Training & Revitalization etc SOAS• E & L, CoC PARADISEC• Archive Access ?• Long-term Storage DOBES

• pressure group • joint fund raising activities • Adopt a Language activity ??

DOBES WorkshopFrankfurtApril 2003

Page 22: 1 Report from MPI Team Roman Skiba Peter Wittenburg DOBES Workshop Frankfurt April 2003

22

Collaborations II• E-Meld

• joint developers workshop • joint CV editor by MPI • perhaps joint lexicon tool - interest on both sides (start after Easter with real person power at MPI)

• close exchange with Arizona group about Ontology (Terry & Scott) • joint international workshop on lexicon schemas and registries

• INTERA (Integrated European Language Resource Area)• integration of all metadata about all LR • automatic search for useful tools

• ECHO (European Cultural Heritage Onlie)• additional language resources from archives into MD pool• interoperability issues with domains such as Ethnology, …

• TYPOWEB (proposal to EU)• project to define an open distributed typology framework • inclusion of DOBES and SOAS teams as testers (if they like)• a number of excellent typologists, field linguists and 2 technology p

• LanguageWeb (proposal to EU) knowledge basis for lang tech• CHaSE (proposal to EU) open tech framework for cultural heritage• data-GRID initiatives (to come) network for fast data exchange

DOBES WorkshopFrankfurtApril 2003

Page 23: 1 Report from MPI Team Roman Skiba Peter Wittenburg DOBES Workshop Frankfurt April 2003

23

DOBESTraining Course

DOBES WorkshopFrankfurtApril 2003

Page 24: 1 Report from MPI Team Roman Skiba Peter Wittenburg DOBES Workshop Frankfurt April 2003

24

Training Courses • date 2-6 June

• everyone is invited - in particular new teams • all new teams showed interest - want much practical stuff • planning now content - any comment is welcome • will distribute the new schedule soon • “old” teams are invited to present topics / experience reports / …

• open to SOAS teams

• will carry out training courses in Germany together with GBS (Nikolaus Himmelmann)

DOBES WorkshopFrankfurtApril 2003