dspace at ilri: a semi-technical overview of “cgspace”

21
A semi-technical overview of “CGSpace” DSpace at ILRI Alan Orth KAINET Open Data and Open Science’ Workshop Nairobi, Kenya, 18 June 2015

Upload: ilri

Post on 30-Jul-2015

135 views

Category:

Science


0 download

TRANSCRIPT

Page 1: DSpace at ILRI: A semi-technical overview of “CGSpace”

A semi-technical overview of “CGSpace”

DSpace at ILRI

Alan OrthKAINET Open Data and Open Science’ Workshop

Nairobi, Kenya, 18 June 2015

Page 2: DSpace at ILRI: A semi-technical overview of “CGSpace”

History of DSpace at ILRI

● 2009: ILRI launches Mahider (“repository” in Amharic)

● 2010: Other CGIAR centers and programs join our platform and share hard / soft costs

● 2011: Rebranded as “CGSpace”● 2015: 9 CGIAR centers, ~50,000 items, ~250k

hits/month

Page 3: DSpace at ILRI: A semi-technical overview of “CGSpace”

“CGSpace” in June, 2015

Page 4: DSpace at ILRI: A semi-technical overview of “CGSpace”

How we use DSpace

● Content people embedded in each department help capture results (presentations, papers, brochures, etc)

● Primary location for institutional outputs!● No posting PDFs on corporate website!● Integrate with website and blogs via RSS feeds● Direct ALL traffic to DSpace!● For data sets, videos, etc we make a metadata-

only accession with a link to eg YouTube

Page 5: DSpace at ILRI: A semi-technical overview of “CGSpace”

● Communities, sub-communities, and collections● Tempting to model after organization hierarchy!● (we did)● … but organization hierarchies change!

DSpace hierarchies

Page 6: DSpace at ILRI: A semi-technical overview of “CGSpace”

Mostly organized by output type now...

Page 7: DSpace at ILRI: A semi-technical overview of “CGSpace”

Metadata

● Standard Dublin Core is available● No AGROVOC● You can create custom controlled vocabularies in

arbitrary namespaces, eg: cg.subject.ilri

Page 8: DSpace at ILRI: A semi-technical overview of “CGSpace”

Custom metadata in ILRI report

Not AGROVOC!

Page 9: DSpace at ILRI: A semi-technical overview of “CGSpace”

“Discovery” facets

● Context-aware metadata summaries

● Side effect: helps spot metadata inconsistencies!

● … Open Access, Open access, open Access, etc.

Page 10: DSpace at ILRI: A semi-technical overview of “CGSpace”

Search engine optimization (SEO)

Help Google Scholar consume your content!

● XML sitemaps● Consistent domain name, eg: cgspace.cgiar.org● Persistent links for resources● Website speed and HTTPS both a plus● Sign up for Google Webmaster Tools to submit

sitemap, control indexing, see stats, etc

Page 11: DSpace at ILRI: A semi-technical overview of “CGSpace”

Sitemap view in Google Webmaster Tools

Page 12: DSpace at ILRI: A semi-technical overview of “CGSpace”

Importance of persistent links

● Website addresses change…● mahider.ilri.org -> cgspace.cgiar.org● But resources stay the same!

http://hdl.handle.net/10568/67073

● “Handle” service from handle.net● Everything under prefix 10568 is CGSpace● Default DSpace handle prefix is 123456789!

Page 13: DSpace at ILRI: A semi-technical overview of “CGSpace”

dc.identifier.uri specifies an item’s persistent universal resource identifier (URI)

Page 14: DSpace at ILRI: A semi-technical overview of “CGSpace”

Getting data INTO DSpace

● Day-to-day submission is manual, by a small army of editors

● One-time batch uploads of items from other systems in CSV format (InMagic!)

● OAI-PMH for metadata only● OAI-ORE for metadata + bitstreams (eg, from

another DSpace or Sharepoint, etc)● SWORD (haven't tried)● REST API (DSpace 5+, haven't tried)

Page 15: DSpace at ILRI: A semi-technical overview of “CGSpace”

Getting data OUT OF DSpace

● REST API for structured JSON or XML● OAI-PMH for metadata● OAI-ORE for metadata + bitstreams (PDFs, etc)● RSS feeds for websites / blogs● XML sitemaps for search engines*

*Google discontinued the use of OAI for discovering site content in 2008! http://googlewebmastercentral.blogspot.com/2008/04/retiring-support-for-oai-pmh-in.html

Page 16: DSpace at ILRI: A semi-technical overview of “CGSpace”

CCAFS website, driven by Drupal + DSpace APIs

Page 17: DSpace at ILRI: A semi-technical overview of “CGSpace”

“Latest outputs” on project blog populated via RSS, links to CGSpace

Page 18: DSpace at ILRI: A semi-technical overview of “CGSpace”

Open source workflow on GitHub

https://github.com/ilri/DSpace

Page 19: DSpace at ILRI: A semi-technical overview of “CGSpace”

Skills needed in your organization

Besides content people(!)...

● Prioritize Linux systems administration experience (Tomcat, httpd, PostgreSQL, DNS, SSH, git)

● General: computer science background● Web developers a diverse bunch...● Java development experience doesn't hurt

Page 20: DSpace at ILRI: A semi-technical overview of “CGSpace”

Extra considerations

● Item mapping● Maintenance tasks (background batch jobs)● Backups of assetstore and PostgreSQL!● Altmetrics tracks social media mentions● Separate production / development

environments● CGSpace server is $80/month● ~20GB of PDFs, ~8GB of Solr data

Page 21: DSpace at ILRI: A semi-technical overview of “CGSpace”

Getting help

● “DSpace Tech” mailing list● “dspace” tag on StackOverflow website● [email protected]