data stewardship and the decentralized webdata stewardship and the decentralized web danielle...

Data Stewardship and the Decentralized Web

DANIELLE ROBINSON, PhDCo-Executive Director at Code for Science & Society

@daniellecrobins @codeforsociety

Code for Science & Society

Supporting open source in the public interest

Code for Science & Society

Civic tech +Scholarly research +New media +Open source + Equity, support, inclusion

= CS&S community

Sharing experiencesBringing:- Knowledge of decentralized

computing, data collection & management

Seeking:- Better understanding of

needs, challenges of your community

What is the future of data stewardship?

- Bringing together leaders, stakeholders

- Design a cooperative data preservation network

- Push for ‘FAIR’ and save libraries money

Adam Brock

https://www.flickr.com/photos/adambrock/

1. Data on the web

2. A new model of data stewardship

3. Prototyping decentralized preservation

4. Reimagine data on the web

@daniellecrobins

Across domains, data live online

Early work of a writer

Government data

Newspaper archives

Your family photos

Scientific data

@daniellecrobins

@daniellecrobins

Data transparency: Inconsistent practices across domains

@daniellecrobins

Many data publishing optionshttps://www.ohsu.edu/xd/education/library/data/share-and-archive/index.cfm

@daniellecrobins

Siloed info, centralized gate keepers control access

Doc Searls

https://www.flickr.com/photos/docsearls/

@daniellecrobinshttps://imgflip.com/memegenerator/Picard-Wtf

http://som.csudh.edu/fac/lpress/history/arpamaps/ @daniellecrobins

Distributed beginnings

Clark Boyd

https://medium.com/@clarkboyd?source=post_header_lockup

@daniellecrobins

Web centralization

Image courtesy of Beaker Browser

@daniellecrobins

Web centralization

It’s easier to manage and monetize a silo


“We embed values into our technology whether we are aware

of it or not”- Stephen Whitmore (@noffle)

Digital Democracy

See also the work of Safiya Noble

@daniellecrobinshttps://blog.datproject.org/2018/03/05/css-community-call-03-2018/

http://www.digital-democracy.org/

@daniellecrobins

In the centralized web

We trust the server to locate, not change objects

Silos are the natural state

Data may be in multiple silos

@daniellecrobins

Today’s web relies upon

URLs to identify location of objects

Ability to change information without changing location

Aggregating content for discovery

@daniellecrobins

Today’s web lacks

Persistent identifiers

Transparent change log

Links between silos

@daniellecrobins

“The internet is a terribly unstable way to keep information available”

- Laurie AllenPenn Libraries' Assistant Director for Digital Scholarship

@daniellecrobins

“Federal data ≅ website”https://www1.ncdc.noaa.gov/pub/data/

@daniellecrobins

Why are federal data ≅ webpages?

To find an object online:

1. Discover the link2. Link still works

3. Trust the info at the link

https://www.slideshare.net/shefw/save-the-data-the-role-of-librarians-in-datarescue-collaborations

@daniellecrobins

Why are federal data ≅ webpages?

https://www1.ncdc.noaa.gov/pub/data/annualreports

https://www.slideshare.net/shefw/save-the-data-the-role-of-librarians-in-datarescue-collaborations

https://www1.ncdc.noaa.gov/pub/data/annual

@daniellecrobinsM. Klein, several papers and talks, links at end

Link rot: When links fail

Content Drift: When referenced content are changed

Link rot + content drift = Reference rot

@daniellecrobins

The Internet is broken

and we are using itto access and distributeall of human knowledge

¯\_(ツ)_/¯

@daniellecrobinsits all about Rock (:

The web is being reimagined

https://www.flickr.com/photos/michellerocks/

@daniellecrobins

What’s important to you?romana klee

https://www.flickr.com/photos/nauright/

1. Data on the web




@daniellecrobins

@daniellecrobins

Preservation starts here

“Sharing research data is not well understood, incentivized,

or accessible”

Daniella Lowenberg Research Data Specialist

Product Manager of @uc3dashCalifornia Digital Library

@daniellecrobinshttps://medium.com/@UC3CDL/we-are-talking-loudly-and-no-one-is-listening-a108248693f7 / csv

screenshot from https://peerj.com/preprints/2588/

and preserving

^

https://medium.com/@UC3CDL/we-are-talking-loudly-and-no-one-is-listening-a108248693f7

Preservation requires custody@daniellecrobinsseagen

https://www.flickr.com/photos/seagen/

Centralized model requirescustody to provide access

@daniellecrobinsImage courtesy of Beaker Browser

@daniellecrobins

Web accessible objectsVia Agency

Is custody required?@daniellecrobins#WOCinTech Chat

https://www.flickr.com/photos/wocintechchat/

“Preservation in place… Bring preservation services

to the content”

-Stephen AbramsPreservation without Possession

California Digital Library

@daniellecrobinshttps://figshare.com/articles/Preservation_without_possession_Content-

addressable_identifiers_for_post-custodial_preservation/5844369

Sharing data and costs@daniellecrobins

Cooperative of trusted entities


@daniellecrobinsSangyaPundir / www.force11.org/group/fairgroup/fairprinciples

https://commons.wikimedia.org/w/index.php?title=User:SangyaPundir&action=edit&redlink=1

@daniellecrobinswww.force11.org/group/fairgroup/fairprinciples

Leverage existing infrastructure

@daniellecrobinsPeter Miller

Visions are nice!

https://www.flickr.com/photos/pmillera4/

@daniellecrobins

Now let’s get realvladeb

https://www.flickr.com/photos/28122162@N04/

1. Data on the web




@daniellecrobins

@daniellecrobins

Multiple decentralized approaches

BTC Keychain / Danilo / http://www.ala.org/tools/future/trends/blockchain /

https://gist.github.com/mafintosh/bd9e6d350ebf02441c9707c5f799d05b

Blockchain Peer-to-peer

https://www.flickr.com/photos/btckeychain/

https://www.flickr.com/photos/chutzpah72/

Data stored at central location, accessed by independent users

@daniellecrobinsImage courtesy of Beaker Browser

Centralized “hub and spoke” model

Data persistently identified, networked ability to scale

@daniellecrobins

Decentralized models


@daniellecrobins

Peer-to-peer public technology

https://github.com/mafintosh/bws-2017

@daniellecrobins

What’s Dat?

Persistent identifiers

+

Network of peers

https://github.com/datproject/docs/blob/master/papers/dat-paper.pdf

@daniellecrobins

Dat + scholarly data =

- Automate preservation, versioning

- Find data across storage locations

- Spread cost burden across network

- Foundational links between silos

@daniellecrobins俍宏葉

Reimagine data preservation

https://www.flickr.com/photos/117010613@N04/

@daniellecrobins

It’s all about TRUST


@daniellecrobins

… and I trust LIBRARIES


@daniellecrobinsEran Sandler

Building a prototype

https://www.flickr.com/photos/ogimogi/

@daniellecrobinsDr. Dannise V. Ruiz-Ramos describes sea star genome annotation pipeline

Start with data creation

https://scholar.google.com/citations?user=Zgwjck8AAAAJ&hl=en

https://scholar.google.com/citations?user=Zgwjck8AAAAJ&hl=en

Dat in the Lab lessons:

Leverage existing workflows

Automate data versioning, preservation

Link researchers to library

Now linking libraries to each other

@daniellecrobinshttps://blog.datproject.org/tag/science/

https://blog.datproject.org/tag/science/

Prototype: CDL - IA - SDSC

CDL’s DASH corpus (<5 TB)

Copied to IA and SDSC

Deal with technical hurdles (S3)

Next: Monitoring dynamic information

@daniellecrobins

Every institution contributes

Storage, bandwidth

Metadata on their collection

Commitment to preserve their collection

to the network

@daniellecrobins

Any user can access

Information on library collections

History of objects

Whole or partial data sets

from the network

@daniellecrobins

1. Data on the web




@daniellecrobins

@daniellecrobins

What’s important to

you?

www.liveoncelivewild.com

http://www.liveoncelivewild.com

Discussion:

● What are the data types that your organization is responsible for?

● How are those data created, stored, used? When do they come to you?

● Who interacts with data? How do they interact with it?

● How are equity, justice addressed (or not) in data stewardship plans?

● What are your concerns around long term preservation of data?

https://library.auraria.edu/d2pproject/about

The Data to Policy Project (D2P) is an initiative to engage students with their community’s

needs through course-based assignments, which culminate into data-driven policy

proposals to local governments and agencies.

Cool project alert!

Thank you to the Western States Government Information

Conference Planning Committee

DANIELLE ROBINSON, PhDCo-Executive Director at Code for Science & Society

@daniellecrobins @codeforsociety

Discussion:

● What are the data types that your organization is responsible for?

● How are those data created, stored, used? When do they come to you?

● Who interacts with data? How do they interact with it?

● How are equity, justice addressed (or not) in data stewardship plans?

● What are your concerns around long term preservation of data?

data stewardship and the decentralized webdata stewardship and the decentralized web danielle...

Documents