pacific and regional archive for digital sources in endangered cultures paradisec background,...

24
Pacific and Regional Archive for Digital Sources in Pacific and Regional Archive for Digital Sources in Endangered Cultures Endangered Cultures PARADISEC background, PARADISEC background, current structures, and current structures, and thoughts on thoughts on international international collaborations collaborations Linda Barwick, University of Sydney Linda Barwick, University of Sydney DELAMAN workshop, MPI Nijmegen, 29 November DELAMAN workshop, MPI Nijmegen, 29 November 2004 2004

Post on 21-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Pacific and Regional Archive for Digital Sources in Endangered Pacific and Regional Archive for Digital Sources in Endangered CulturesCulturesPARADISEC background, PARADISEC background,

current structures, and current structures, and thoughts on international thoughts on international

collaborationscollaborations

Linda Barwick, University of SydneyLinda Barwick, University of SydneyDELAMAN workshop, MPI Nijmegen, 29 November DELAMAN workshop, MPI Nijmegen, 29 November

20042004

PARADISEC PARADISEC structurestructure

CIs: Cliff GoddardCIs: Cliff GoddardHugh de FerrantiHugh de Ferranti

CIs: Steve BirdCIs: Steve BirdNick EvansNick EvansCathy FalkCathy Falk

Janet FletcherJanet FletcherJohn HajekJohn Hajek

CIs: Andrew PawleyCIs: Andrew PawleyJohn BowdenJohn Bowden

Malcolm RossMalcolm RossAlan RumseyAlan Rumsey

Project ManagerProject Manager(Metadata guru)(Metadata guru)Nick ThiebergerNick Thieberger

Audio Archiving UnitAudio Archiving UnitDirector: Linda BarwickDirector: Linda BarwickAudio: Frank DaveyAudio: Frank DaveyProject Liaison: Amanda HarrisProject Liaison: Amanda Harris

Store account - web interfaceStore account - web interfaceStuart HungerfordStuart Hungerford

CIs: William FoleyCIs: William FoleyAllan MarettAllan MarettJane SimpsonJane Simpson

PARADISEC PARADISEC rationalerationale•prioritises Asia-Pacific region materials not prioritises Asia-Pacific region materials not

otherwise catered for;otherwise catered for;

•provides a rational framework for provides a rational framework for prioritising and managing University prioritising and managing University research recordings using international research recordings using international archival formats and standards;archival formats and standards;

•implements IP arrangements tailored to implements IP arrangements tailored to University needs and practices;University needs and practices;

•involves researchers in specialist involves researchers in specialist description of resources;description of resources;

•streamlines consortium processes to streamlines consortium processes to salvage important recordings and make salvage important recordings and make them available for research in a timely and them available for research in a timely and cost-effective waycost-effective way

•Making Australian research available internationally Making Australian research available internationally

•Fieldwork - use for elicitation and documentation, Fieldwork - use for elicitation and documentation, and for language learning in preparation for and for language learning in preparation for fieldworkfieldwork

•Return of materials to communitiesReturn of materials to communities

•Digital tools for optimal transcription and analysisDigital tools for optimal transcription and analysis

•Comparative studies - historical recordings give Comparative studies - historical recordings give time depth for area language and music studiestime depth for area language and music studies

•Better understanding of diversity - data from some Better understanding of diversity - data from some languages only in older recordingslanguages only in older recordings

•Incorporation of primary data in presentations and, Incorporation of primary data in presentations and, ultimately, publicationsultimately, publications

Research Research applicationsapplications

Staged approachStaged approach

•Metadata - 1623 records, to make Metadata - 1623 records, to make resources discoverable even if not resources discoverable even if not yet digitisedyet digitised

•PIs and content metadata need to PIs and content metadata need to be assigned before digitisation be assigned before digitisation (some refinement during process)(some refinement during process)

•Repository - 807 items digitised to Repository - 807 items digitised to date, some complex e.g. fieldnotes date, some complex e.g. fieldnotes (page images) or transcripts (page images) or transcripts accompanying tapesaccompanying tapes

Metadata November 2004Metadata November 2004

•1623 records in the metadata repository 1623 records in the metadata repository with data from 24 countries in Asia-Pacificwith data from 24 countries in Asia-Pacific((Australia, Chile, Cook Islands, Fiji, French Australia, Chile, Cook Islands, Fiji, French Polynesia, Hong Kong, Indonesia, India, Polynesia, Hong Kong, Indonesia, India, Japan, Korea, Lao, Malaysia, Federated Japan, Korea, Lao, Malaysia, Federated States of Micronesia, Myanmar (Burma), New States of Micronesia, Myanmar (Burma), New Zealand, Palau, Papua New Guinea, Reunion, Zealand, Palau, Papua New Guinea, Reunion, Singapore, Solomon Islands, Taiwan, Tonga, Singapore, Solomon Islands, Taiwan, Tonga, Vanuatu, VietnamVanuatu, Vietnam))

Metadata OLAC harvestMetadata OLAC harvest

Repository Repository contentscontents

•Repository totals 26 November 2004Repository totals 26 November 2004

•total files: 2582total files: 2582

•total items: 807total items: 807

•total size: 1.0TBtotal size: 1.0TB

•total hours audio: 627.3 hourstotal hours audio: 627.3 hours

•file types: .wav, .mp3 (1040); .tif, file types: .wav, .mp3 (1040); .tif, (179), .jpg (46), .pdf (34), .txt (179), .jpg (46), .pdf (34), .txt (3), .rtf (8), .xml (32)(3), .rtf (8), .xml (32)

Repository Repository CollectionsCollectionsBradley (5hr)Bradley (5hr)

Capell (9hr)*Capell (9hr)*Corris (6hr)Corris (6hr)Crowther (2hr)Crowther (2hr)Donohue (3hr)Donohue (3hr)Dutton (266hr)Dutton (266hr)Fedden (7hr)Fedden (7hr)Foley (23hr)Foley (23hr)Gardner (56hr)Gardner (56hr)Kartomi (2hr)*Kartomi (2hr)*Laycock (29hr)Laycock (29hr)Lawton (3hr)Lawton (3hr)McElhanon (41hr)McElhanon (41hr)

McIntyre (10hr)McIntyre (10hr)Margetts (17hr)Margetts (17hr)Rumsey (17hr)*Rumsey (17hr)*San Roque (1hr)San Roque (1hr)Sam (4hr)*Sam (4hr)*Tepano (19hr)Tepano (19hr)Thieberger (39hr)Thieberger (39hr)Toulmin (35hr)Toulmin (35hr)Voorhoeve (33hr)*Voorhoeve (33hr)*Wurm (2)*Wurm (2)*Evans (Hons thesis)Evans (Hons thesis)Thieberger (PhD Thieberger (PhD thesis)thesis)* Ingestion ongoing November 2004* Ingestion ongoing November 2004

AC1 AM2 AM3 AM4 AR1 BE1 CLV1 DB1 DG3 DL1 KM1 LS1 LSR1 MC1 MC2 MD1 MK2 MT1 NT1 NT2 NT3 NT4 RL1 SAW2 SF1 TD1 TT1 WF1 WS1

PAPUA N. GUINEAAbauAmbonese PidginAngoram (Kanduanuin)Angoram (Moim dialect)AomieArapeshArifamaAunaleiAuwimAwomoBaBalawaiaBaraiBarugaBarupu (Warapu)Be'aniviaBiageBiboBinandereBodinumuBoeraBoineBokuBoridiBouxulaBratMomireBuinBurumChimbaChirimaDagaDaravaDawawaDedua

DimaDimadimaDinaDogaDomuDoromuDouraEfogiEfogi DialectsEmoEnivilogoForeFuyugeyGabadiGinumanGwedenaHereiHiae MotuHiri MotuHubeHulaI'aiIkegaIomaIsaka (Krisa)KaipiKairiKambotKangaKaramaKarawari Lg (Ambinwari)KarukaruKâte

KinalakngaKimiKiriwinaKoiariKoitaKoitabuKokilaKokoroKombaKoparKorikiKorikoKosorongKovaiKovioKubuirubuKumanKumukioKuniKunimaipaKwaleLaimodoMada'aMagiMâgobinengMagoreMaisinMaiwaManagalasManamManubaraManumuMapeiMapena

MariMariaMekeoMelpaMianMid-WahgiMigabacMindikMiniafaMogoniMomMorMotuMuhiang ArapeshNabakNagaNamanadzaNaoroNaraNew Ireland PidginNgalaNomuNotuOndoroOne (Onne)OnjabOnoOpaoOrokaivaOrokoloOumaPaiwaPolice MotuPorome

Qld PidginRabukaRaepa TatiSalibaSamoSeneSepik Tok PisinSialumSinaugoroSonaSuauSukuSuraiTaboroTairumaTauadeToboTok PisinTolaiUberiUbirUbir GonjoeVesilogoVioribaiwaWamoraWangunWigaWoseraYele.YewuduYimasYoba

COOKISLANDSRarotonganPukapuka

FRENCHPOLYNESIATahitian

CHILE >>>Rapa Nui

PALAUPalauan

SOLOMONSBabatanaRirioRuvianaVareseLauSanta Cruz

INDONESIAAsmatBratHatamInanwatanManikionMoiNingrumSahuSebyarTinamTodaheTok PisinYahadian

.

PARADISEC Repository Languages PARADISEC Repository Languages November 2004November 2004

INDIARajbangsi

NEW CALEDONIADehu

VANUATUSouth EfateBislamaLelepa

FIJILauan

TONGATongan

Regional linksRegional links

•Institute of Papua New Guinea StudiesInstitute of Papua New Guinea Studies

•Vanuatu Kaljoral SentaVanuatu Kaljoral Senta

•Archive of Maori and Pacific Music, U. AucklandArchive of Maori and Pacific Music, U. Auckland

•University of Hawai’iUniversity of Hawai’i

•New Caledonia - Tjibaou Cultural CentreNew Caledonia - Tjibaou Cultural Centre

•Indonesia - UIN, JakartaIndonesia - UIN, Jakarta

•Malaysia - Universiti MalayaMalaysia - Universiti Malaya

•Rapa Nui - Museo antropologico P. Sebastian EnglertRapa Nui - Museo antropologico P. Sebastian Englert

•Micronesia - Historical Preservation Office, YapMicronesia - Historical Preservation Office, Yap

Audio IngestAudio Ingest

•Initially ingested as raw WAV on Initially ingested as raw WAV on AudioCube 5 Dell 670 AudioCube 5 Dell 670 workstations workstations running Wavelab (2005 will add remote running Wavelab (2005 will add remote Pyramix workstations)Pyramix workstations)

•Masters 24-bit 96khz Broadcast WAV Format Masters 24-bit 96khz Broadcast WAV Format (uncompressed audio with encapsulated metadata)(uncompressed audio with encapsulated metadata)

•Some lower rate if digital original (e.g. 16bit 48khz Some lower rate if digital original (e.g. 16bit 48khz from DAT)from DAT)

•WAV > BWF by Quadriga softwareWAV > BWF by Quadriga software

•derivatives produced by batch processing - CD-audio derivatives produced by batch processing - CD-audio quality (16-bit, 44.1khz) and mp3 quality(128bps)quality (16-bit, 44.1khz) and mp3 quality(128bps)

Digital Digital preservationpreservation

•““Azoulay” server partitioned for working files Azoulay” server partitioned for working files and archive partition for sealed masters - and archive partition for sealed masters - current capacity 750GB (>3TB in 2005)current capacity 750GB (>3TB in 2005)

•Sealed masters archived to 100GB data tapes Sealed masters archived to 100GB data tapes on University of Sydney LTO Mass Data Storage on University of Sydney LTO Mass Data Storage System (high-low watermark script) - duplicate System (high-low watermark script) - duplicate data tapes kept at 2 locations on campusdata tapes kept at 2 locations on campus

•Sealed masters mirrored to APAC national Store Sealed masters mirrored to APAC national Store facility (Canberra) nightly - nearline storagefacility (Canberra) nightly - nearline storage

•Password-protected online access to Store Password-protected online access to Store facilityfacility

PDSC data flowPDSC data flow

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

NetworkingNetworking

•Main campuses (University of Sydney, University of Main campuses (University of Sydney, University of Melbourne, Australian National University) Melbourne, Australian National University) connected by Grangenet (next generation research connected by Grangenet (next generation research network, 10Gbps connections)network, 10Gbps connections)

•Pay subscription, not traffic costsPay subscription, not traffic costs

•Satellite campus UNE connected by AARnet Satellite campus UNE connected by AARnet (Australian research and education network - (Australian research and education network - currently billed traffic cost, 155Mbps connection)currently billed traffic cost, 155Mbps connection)

•Both with connections to APAN community (Asia Both with connections to APAN community (Asia Pacific Advanced Networks) - potential for linking Pacific Advanced Networks) - potential for linking to regional and international R&E networks - to regional and international R&E networks - potential traffic costs an issuepotential traffic costs an issue

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

StorageStorage

•Australian Partnership for Advanced Computing National Australian Partnership for Advanced Computing National Facility Mass Data Storage System - Hierarchical Storage Facility Mass Data Storage System - Hierarchical Storage Manager systemManager system

•Funded by consortium of Australian higher education Funded by consortium of Australian higher education bodiesbodies

•Tape robot system - can handle 1.2PBTape robot system - can handle 1.2PB

•PARADISEC will add 2-3TB per year once satellite ingest PARADISEC will add 2-3TB per year once satellite ingest commissionedcommissioned

•Current horizon of facility 2008 - project PARADISEC Current horizon of facility 2008 - project PARADISEC collection up to 9TB by thencollection up to 9TB by then

•Will need to apply to host material/share data from Will need to apply to host material/share data from other DELAMAN collectionsother DELAMAN collections

StreamingStreaming•GrangeNet streaming server currently in trial mode - GrangeNet streaming server currently in trial mode -

only available within networkonly available within network

•Soon to have automatic copying of main collection to Soon to have automatic copying of main collection to streaming serverstreaming server

•Foresee higher demand for access when scaled Foresee higher demand for access when scaled streaming access to excerpts available; but also streaming access to excerpts available; but also greater resources needed to mount and managegreater resources needed to mount and manage

•Will depend on researchers’ provision of timecoded Will depend on researchers’ provision of timecoded transcripts/glossestranscripts/glosses

•Access and authentication protocols yet to be Access and authentication protocols yet to be developeddeveloped

•Testbed for citation/integration into e-publicationsTestbed for citation/integration into e-publications

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

SoftwareSoftware•Initial metadata database in Filemaker Pro 6 Initial metadata database in Filemaker Pro 6

with periodic XML dumps for OLAC static with periodic XML dumps for OLAC static harvestingharvesting

•Currently being ported to MySQL/PHP to Currently being ported to MySQL/PHP to allow dynamic harvesting and other allow dynamic harvesting and other functionalityfunctionality

•Python software for managing repository and Python software for managing repository and website (Stuart Hungerford, ANU)website (Stuart Hungerford, ANU)

•Developing Java-based geographic search Developing Java-based geographic search interface (TimeMap)interface (TimeMap)

•All based on Open Source toolsAll based on Open Source tools

ImplicationsImplications•Implementations will change over time - foundation for Implementations will change over time - foundation for

cooperation must be agreements and alignment of cooperation must be agreements and alignment of strategic objectivesstrategic objectives

•Minimal shared standards needed on formats, ethics, Minimal shared standards needed on formats, ethics, description, rights - what else?description, rights - what else?

•Possibility of staged modular approachPossibility of staged modular approach

•federated discovery platform federated discovery platform

•proof-of-concept pilot studies/trialsproof-of-concept pilot studies/trials

•targeted data sets for exchangetargeted data sets for exchange

•dark hosting/mirroringdark hosting/mirroring

•tools development and testingtools development and testing

IssuesIssues•Transnational projects - how to identify and Transnational projects - how to identify and

coordinate international funding opportunities?coordinate international funding opportunities?

•Projections of international traffic & storage charges Projections of international traffic & storage charges - funding implications- funding implications

•Sustainability of our collections - how to cost Sustainability of our collections - how to cost overheads and source long-term funding overheads and source long-term funding commitmentscommitments

•DELAMAN governance and administration structures? DELAMAN governance and administration structures? How to resource and support without How to resource and support without duplication/reinventing the wheel, adding to duplication/reinventing the wheel, adding to administrative burden?administrative burden?

•How to involve all stakeholders (including How to involve all stakeholders (including local/national bodies of originating communities)?local/national bodies of originating communities)?

APAN Bangkok APAN Bangkok 20052005

•E-science workshop: Toward a semantic web for digital data E-science workshop: Toward a semantic web for digital data archives (convenor V. Balaji, Princeton)archives (convenor V. Balaji, Princeton)

•Immense quantities of digital data and images are now archived and publicly available Immense quantities of digital data and images are now archived and publicly available through the web. These include domain-specific data archives, covering such domains as through the web. These include domain-specific data archives, covering such domains as weather and climate, seismology and geophysics, astronomy and particle physics, as well weather and climate, seismology and geophysics, astronomy and particle physics, as well as images and digital copies of non-textual human cultural production. Describing, as images and digital copies of non-textual human cultural production. Describing, cataloguing, searching and locating information within digital data and image archives is cataloguing, searching and locating information within digital data and image archives is one of the grand technological challenges of the semantic web era. This session will draw one of the grand technological challenges of the semantic web era. This session will draw together participants from diverse fields of science and the humanities to share their together participants from diverse fields of science and the humanities to share their experience on metadata, standards and techniques for access to large digital archives.  experience on metadata, standards and techniques for access to large digital archives.  

•Tentative Titles of presentations:Tentative Titles of presentations:

• 1) The Hierarchical Data Format for EOS (HDF-EOS), Richard Ullman, NASA 1) The Hierarchical Data Format for EOS (HDF-EOS), Richard Ullman, NASA Goddard Space Flight Center (Invited)Goddard Space Flight Center (Invited)

• 2) Metadata Requirements for Global Climate Models, V. Balaji, NOAA Geophysical 2) Metadata Requirements for Global Climate Models, V. Balaji, NOAA Geophysical Fluid Dynamics Laboratory Fluid Dynamics Laboratory 

• 3) DELAMAN?? Remote presentation…3) DELAMAN?? Remote presentation…

PARADISEC gratefully PARADISEC gratefully acknowledges support acknowledges support

from:from:•Partner Universities (Sydney, Melbourne, Partner Universities (Sydney, Melbourne, ANU, UNE)ANU, UNE)

•Australian Research Council LIEF schemeAustralian Research Council LIEF scheme

•Australian Partnership for Sustainable Australian Partnership for Sustainable Repositories (SORRT testbed)Repositories (SORRT testbed)

•Australian Partnership for Advanced Australian Partnership for Advanced ComputingComputing

•GrangenetGrangenet

•ANU Internet FuturesANU Internet Futures

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Contact usContact us• http://www.paradisec.org.auhttp://www.paradisec.org.au

[email protected] [email protected] (Director)(Director)

[email protected]@paradisec.org.au (Project Manager)g.au (Project Manager)

Relevant URLsRelevant URLs

•PARADISEC website PARADISEC website http://paradisec.org.au/http://paradisec.org.au/

•PARADISEC repository login PARADISEC repository login http://store.http://store.apacapac..eduedu.au/.au/cgicgi-bin/-bin/pdscpdsc-v3.0.-v3.0.cgicgi/login/login

•PARADISEC streaming trial PARADISEC streaming trial http://paradisec.org.au/streamingtrial.htmlhttp://paradisec.org.au/streamingtrial.html

•Transcript page image trial Transcript page image trial http://www.austehc.unimelb.edu.au/~gavan/lanhttp://www.austehc.unimelb.edu.au/~gavan/lana/hdms.a/hdms.htmhtm

•TimeMap digitiser tool proof of concept TimeMap digitiser tool proof of concept http://http://aclacl.art..art.usydusyd..eduedu.au/.au/TMDigitiserTMDigitiser//