arctos/tacc collaboration chris jordan texas advanced computing center

13
Arctos/TACC Collaboration Chris Jordan Texas Advanced Computing Center

Upload: chesna

Post on 21-Jan-2016

29 views

Category:

Documents


0 download

DESCRIPTION

Arctos/TACC Collaboration Chris Jordan Texas Advanced Computing Center. Arctos: A 15 year history. MVZ: 1995 - Hired Stan Blum to develop relational data model (following modeling by Assoc. Systematic Collections). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Arctos/TACC Collaboration Chris Jordan Texas Advanced Computing Center

Arctos/TACC CollaborationChris Jordan

Texas Advanced Computing Center

Page 2: Arctos/TACC Collaboration Chris Jordan Texas Advanced Computing Center

Arctos: A 15 year history MVZ: 1995 - 1995 - Hired Stan Blum to develop relational data model (following

modeling by Assoc. Systematic Collections).

MVZ: 1997 - Hired John Wieczorek to implement model (desktop application) using Sybase and Versata. Partial implementation (e.g., no loans).

UAM: 1998-2000 - John W. migrated mammal data to Oracle, set up Versata.

UAM: 2002 - Dusty McDonald replaced Versata with ColdFusion, implemented full model (first web-based instance, aka Arctos).

MSB: 2003 – Joined Arctos at UAM (first multi-hosting instance).

MVZ and MCZ: 2005-2007 - Implemented separate instances of Arctos at Berkeley and Harvard (MVZ: first Postgres, then Oracle).

MVZ: 2009 - Moved hosting of data to Alaska (Virtual Private Database version).

Page 3: Arctos/TACC Collaboration Chris Jordan Texas Advanced Computing Center

Major repositories using the Arctos database:(34 collections of specimens or observations, 1.3M records)

Page 4: Arctos/TACC Collaboration Chris Jordan Texas Advanced Computing Center

TACC and TeraGridTACC and TeraGrid

10-year history of Research 10-year history of Research CyberinfrastructureCyberinfrastructure Supercomputing, Visualization and StorageSupercomputing, Visualization and Storage Supported by NSF to provide research Supported by NSF to provide research

resourcesresources TACC expansion of Data-focused supportTACC expansion of Data-focused support

1 Petabyte dedicated online disk1 Petabyte dedicated online disk 10 Petabytes offline archive10 Petabytes offline archive National network of replication resourcesNational network of replication resources

Page 5: Arctos/TACC Collaboration Chris Jordan Texas Advanced Computing Center

Data Diversity at TACCData Diversity at TACC

Image Collections (Natural History, Art, Image Collections (Natural History, Art, etc)etc)

Structured Data (Economics, Public Health)Structured Data (Economics, Public Health) BioMolecular Data (DNA, RNAseq, etc)BioMolecular Data (DNA, RNAseq, etc) Physical Sciences/Simulation DataPhysical Sciences/Simulation Data Geographic data (Climate, Disaster Geographic data (Climate, Disaster

Preparedness)Preparedness) Integrated Infrastructure Supports Diverse Integrated Infrastructure Supports Diverse

CollectionsCollections

Page 6: Arctos/TACC Collaboration Chris Jordan Texas Advanced Computing Center

Arctos is…A versatile online collections management system

Cataloged Items (ID, attributes, parts, etc.; batch uploading, downloading, editing; encumbrances)

Localities & Collecting Events (mapping, media, history)

Transactions (loans, accessions, borrows, permits; email reminders)

Usage (publications, projects, sponsors, GenBank)

Curatorial (object tracking, parts, condition, relations, etc.)

Determination history (identification, georef, attributes)

Page 7: Arctos/TACC Collaboration Chris Jordan Texas Advanced Computing Center

Breadth of Data in Arctos Fish, amphibians, reptiles, mammals, birds and bird Fish, amphibians, reptiles, mammals, birds and bird eggs/nests, plants, arthropods, fossils, molluscseggs/nests, plants, arthropods, fossils, molluscs Specimens and observationsSpecimens and observations Media (images, audio)Media (images, audio) Publications, fieldnotesPublications, fieldnotes

Arctos constantly evolving to incorporate new kinds of Arctos constantly evolving to incorporate new kinds of data, e.g.,:data, e.g.,: Better representation of non-publication documents Better representation of non-publication documents (fieldnotes, correspondence)(fieldnotes, correspondence) Cultural collections (art, anthropology...)Cultural collections (art, anthropology...)

Nearly all that is known about an object (or observation) can be included in Arctos.

Page 8: Arctos/TACC Collaboration Chris Jordan Texas Advanced Computing Center

Arctos/TACC PartnershipArctos/TACC Partnership

Arctos hosts web/database resourcesArctos hosts web/database resources TACC hosts media collectionsTACC hosts media collections

Images, Recordings, etcImages, Recordings, etc Simple workflows for automated Simple workflows for automated

generation of thumbnails, JPG versions, generation of thumbnails, JPG versions, MP3s, OCRMP3s, OCR

Replication policies automatically replicate Replication policies automatically replicate to various storage locationsto various storage locations

Images directly served from TACC to Images directly served from TACC to browsersbrowsers

Page 9: Arctos/TACC Collaboration Chris Jordan Texas Advanced Computing Center

Arctos/TACC HistoryArctos/TACC History

Initial work with UAF Herbarium in Initial work with UAF Herbarium in 20082008

Brought on MVZ Collections in 2009Brought on MVZ Collections in 2009 Ongoing work on web audio, OCROngoing work on web audio, OCR New collections from UAF, UNM, othersNew collections from UAF, UNM, others Currently >300,000 digital objects Currently >300,000 digital objects

under managementunder management Support >100,000 downloads of original Support >100,000 downloads of original

scans each yearscans each year

Page 10: Arctos/TACC Collaboration Chris Jordan Texas Advanced Computing Center

Advantages for Advantages for CollectionsCollections

Lower cost and management overheadLower cost and management overhead Highly reliable, large-scale Highly reliable, large-scale

infrastructureinfrastructure No scalability issuesNo scalability issues Longer-term partnerships promote Longer-term partnerships promote

technical collaboration to add technical collaboration to add capabilities over timecapabilities over time

Provides built-in “Data Management Provides built-in “Data Management Plan”Plan”

Page 11: Arctos/TACC Collaboration Chris Jordan Texas Advanced Computing Center

Long-Term SustainabilityLong-Term Sustainability

TACC plan is to be a permanent TACC plan is to be a permanent research data resourceresearch data resource

Arctos will evolve over time but the Arctos will evolve over time but the collections have permanent valuecollections have permanent value

Infrastructure foundation is stableInfrastructure foundation is stable Agency funding future is uncertainAgency funding future is uncertain Develop diverse funding sources and Develop diverse funding sources and

models to support robust, long-term models to support robust, long-term operationoperation

Page 12: Arctos/TACC Collaboration Chris Jordan Texas Advanced Computing Center

Ongoing EffortsOngoing Efforts

Expansion of storage resources at Expansion of storage resources at TACC (~10PB online disk)TACC (~10PB online disk)

Greater engagement in data Greater engagement in data management activitiesmanagement activities

Working with BRC, ADBC awards Working with BRC, ADBC awards and associated dataand associated data

iPlant Data/Genetic resources – link iPlant Data/Genetic resources – link to specimen records?to specimen records?

Page 13: Arctos/TACC Collaboration Chris Jordan Texas Advanced Computing Center

Thanks for your TimeThanks for your Time

Steffi Ickert-Bond, UAFSteffi Ickert-Bond, UAF Gordon Jarrell, UNMGordon Jarrell, UNM Carla Cicero, MVZCarla Cicero, MVZ Michelle Koo, MVZMichelle Koo, MVZ Dusty Mcdonald, ArctosDusty Mcdonald, Arctos