1 report from mpi team roman skiba peter wittenburg dobes workshop frankfurt april 2003
Post on 19-Dec-2015
216 views
TRANSCRIPT
1
Report from MPI Team
Roman SkibaPeter Wittenburg
DOBES WorkshopFrankfurtApril 2003
2
Data types
• Tapes• Audio, Video (DV-PAL, DV-NTSC, VHS, DAT, MD)• other material: 8mm movies, reel to reel audio, slides, photos• DMFs (mpeg1, mpeg2, wav)• Metadata (IMDI-sessions, IMDI-corpusstructures)• Session media
• mpeg1, wav - for further processing • mpeg2 – for archiving• Html – as a container for text pictures and photos (jpeg)• PDF – as a container for text pictures and photos (jpeg)
• Info files (pdf, txt, html)• Annotations (EAF, shoebox)
DOBES WorkshopFrankfurtApril 2003
3
Statistics
Raw data: tapes, DMFs and other media.
DOBES WorkshopFrankfurtApril 2003
Project/Language Tapes DMFs Other data digitized, converted or delivered
AWETI 71 70 - CHACO 17 17 - KUIKURO 36 37 Slides LACANDON 22 22 - SA-MN 17 (+?) 17 (+?) VCDs SVAN 4 4 - TSOVA-TUSH 5 5 - UDI 2 2 - TEOP 25 25 Lexicon, grammar TOFA 42 42 PDF TRUMAI 88 90 "Reel to reel", 8mm movies, slides, grammar WAIMA 24 24 - WICHITA 7 7 Lexicon examples Total 360 372
4
Statistics II
Corpus units: meta data, media files, annotations .
DOBES WorkshopFrankfurtApril 2003
Project/Language IMDI-files sessions Integrated
imdi
Integrated
annotations
AWETI 35 (+?) 42 (+36) 35
KUIKURO 73 73 3 1
LACANDON 87 67 67
TEOP 31 30 11
TOFA 169 82 43 14
TRUMAI 189 181 187 34
TSOVA-TUSH 7 0 0
WICHITA 2 2 0
Total 593 513 346 49
5DOBES WorkshopFrankfurtApril 2003
6
Digitizing problems
• Recording problems
• due to non-continuous time code• due to long play mode• due to stills between moving pictures (!)
• Communication problems
• Maarten handles all comm with great care • Money problems (due to budget cuts we have to be more careful with expenses - less copying etc)
DOBES WorkshopFrankfurtApril 2003
7
Audio/Video Archiving• many discussions with archivists in particular about audio (Austrian/German audio/phonogram archive, EMELD)
• point at LREC meeting: MP3 and ATRAC (Minidisc) are not ideal, but are acceptable for listening to and normal analysis of speech (discussed type of reduction and effects)
• attitude now: • any MD/MP3 file is reformatted to PCM in the archive• strong recommendation to researchers to use 16 bit linear PCM HF• get best quality you can - new devices such as DENON• what is slightly higher costs for equipment in relation to total budget • miniaturization can be a problem
• DENON Recorder• 192 MB flash cards (or even more)• linear PCM 768 kbps stereo = 16 min / mono = 32 min• MP3 (MPEG2 layer 2) 64 kbps: factor 12 => mono ~ 6 h
DOBES WorkshopFrankfurtApril 2003
8
Video Digitization in the Field• audio no problem • video digitization at MPI was and is a success story • but slow cycle time - therefore digitization in the field
DV-Camera
DV-encoding3.4 MB/sec1h = 20 GBproprietary
limited sw support
MPEG1-encoding1.5 Mbps 1h = 1GB
to work with
MPEG2 copy (~6 Mbps) MPEG1 copy (~1 Mbps)MPEG4 copy (0.5 - …)etc
• MPEG2 widely accepted archive standard, various frontend codecs • still compressed - new standard will come in future • need your tapes (copies) and the MD file to create MPEG2 versions• use camera in continuous mode !!!! then batch segmentation • adapted workflows necessary
I-link
good old mail
conversionTsunami
tests withMPEG-Camera not ok
DOBES WorkshopFrankfurtApril 2003
9
Access to Archiveshort-term
DOBES WorkshopFrankfurtApril 2003
10
Access to the DoBeS archive I
Current state
• Digital data transport via
• Mail (DMF, session media)• FTP (all data) with password and User ID• Email (metadata, annotations, infos)• IMDI Browser (metadata, infos)
DOBES WorkshopFrankfurtApril 2003
11
Access to the DoBeS archive II
Testing new ways
• Digital data transport via
• IMDI Browser (all integrated data types) password and User ID
• HTML corpus (all data types) password and User ID
• Remote access
DOBES WorkshopFrankfurtApril 2003
12
Access to the DoBeS archive III
Future scenario
• Short term solution
• To open all data types of a team for the IMDI Browser (media, annotations etc.)
• Long term solution• File access (user IDs and passwords) administrated by the teams
DOBES WorkshopFrankfurtApril 2003
13
Access to Archivelong-term
DOBES WorkshopFrankfurtApril 2003
14
Archive Access Single Personthe single person solution - the (almost) ideal world
all in one single personal box
DOBES WorkshopFrankfurtApril 2003
15
Archive Access Single Institutethe single institute solution - the (almost) ideal world
all in one single big box for an institute
DOBES WorkshopFrankfurtApril 2003
little more tricky - not all may access everything but one controlling instance
fast networks available
16
Archive Access SI+Webthe single institute solution with Internet Access
the (almost) ideal world
all in one single big box for all
DOBES WorkshopFrankfurtApril 2003
much more tricky - not all may access everything still one controlling instance
but can be faked and slow networks for video control delegation necessary
17
Archive Access DOBES Goal
DOBES WorkshopFrankfurtApril 2003
even more tricky - not all may access everything and everywhere?several controlling instances - need trust mechanisms
control delegation even more necessary stability of paths???
AILLA
SOAS
DOBES
??
18
DOBES Archive Access
DOBES WorkshopFrankfurtApril 2003
resource domainstreaming servers http servers
URID - ACLmapping
URID-Pathmapping
client
URIDPID URL+
resource
users &groups
check whether user is allowed to access res
managementclients
check on valid ticket
19
DOBES Archive Access
essentials
• online archive managers have write (delete) access (consistency, otherwise complex check-in & versioning system)
• question: who has read access rights?• researchers/archivist define access policy - incl. management???
• access per usage request (temporary) or per person/group?• do we need person groups (team members, researchers, community members, …)?• access patterns per infotyp (MD, video, audio, annotations, others)
• as was stated - everyone has to accept CoC and copyright statement!• what about logo and watermarking?
DOBES WorkshopFrankfurtApril 2003
20
Collaborationsof
DOBES Archivist
DOBES WorkshopFrankfurtApril 2003
21
Collaborations I• DELAN (Digital Endangered Languages Archive Network) AILLA, DOBES, ELAR-SOAS, PARADISEC, … link to and support from UNESCO?
• joint web portal with links AILLA? general information, eNEWS Archiv• Electronic Newsletter DOBES• Electronic Preprint Server LL?• Advice+FAQ AILLA?• Training & Revitalization etc SOAS• E & L, CoC PARADISEC• Archive Access ?• Long-term Storage DOBES
• pressure group • joint fund raising activities • Adopt a Language activity ??
DOBES WorkshopFrankfurtApril 2003
22
Collaborations II• E-Meld
• joint developers workshop • joint CV editor by MPI • perhaps joint lexicon tool - interest on both sides (start after Easter with real person power at MPI)
• close exchange with Arizona group about Ontology (Terry & Scott) • joint international workshop on lexicon schemas and registries
• INTERA (Integrated European Language Resource Area)• integration of all metadata about all LR • automatic search for useful tools
• ECHO (European Cultural Heritage Onlie)• additional language resources from archives into MD pool• interoperability issues with domains such as Ethnology, …
• TYPOWEB (proposal to EU)• project to define an open distributed typology framework • inclusion of DOBES and SOAS teams as testers (if they like)• a number of excellent typologists, field linguists and 2 technology p
• LanguageWeb (proposal to EU) knowledge basis for lang tech• CHaSE (proposal to EU) open tech framework for cultural heritage• data-GRID initiatives (to come) network for fast data exchange
DOBES WorkshopFrankfurtApril 2003
23
DOBESTraining Course
DOBES WorkshopFrankfurtApril 2003
24
Training Courses • date 2-6 June
• everyone is invited - in particular new teams • all new teams showed interest - want much practical stuff • planning now content - any comment is welcome • will distribute the new schedule soon • “old” teams are invited to present topics / experience reports / …
• open to SOAS teams
• will carry out training courses in Germany together with GBS (Nikolaus Himmelmann)
DOBES WorkshopFrankfurtApril 2003