data stewardship and the decentralized webdata stewardship and the decentralized web danielle...
TRANSCRIPT
Data Stewardship and the Decentralized Web
DANIELLE ROBINSON, PhDCo-Executive Director at Code for Science & Society
@daniellecrobins @codeforsociety
Code for Science & Society
Supporting open source in the public interest
Code for Science & Society
Civic tech +Scholarly research +New media +Open source + Equity, support, inclusion
= CS&S community
Sharing experiencesBringing:- Knowledge of decentralized
computing, data collection & management
Seeking:- Better understanding of
needs, challenges of your community
What is the future of data stewardship?
- Bringing together leaders, stakeholders
- Design a cooperative data preservation network
- Push for ‘FAIR’ and save libraries money
Adam Brock
1. Data on the web
2. A new model of data stewardship
3. Prototyping decentralized preservation
4. Reimagine data on the web
@daniellecrobins
1. Data on the web
2. A new model of data stewardship
3. Prototyping decentralized preservation
4. Reimagine data on the web
@daniellecrobins
Across domains, data live online
Early work of a writer
Government data
Newspaper archives
Your family photos
Scientific data
@daniellecrobins
@daniellecrobins
Data transparency: Inconsistent practices across domains
@daniellecrobins
Many data publishing optionshttps://www.ohsu.edu/xd/education/library/data/share-and-archive/index.cfm
@daniellecrobins
Siloed info, centralized gate keepers control access
Doc Searls
@daniellecrobinshttps://imgflip.com/memegenerator/Picard-Wtf
http://som.csudh.edu/fac/lpress/history/arpamaps/ @daniellecrobins
Distributed beginnings
@daniellecrobins
Web centralization
Image courtesy of Beaker Browser
@daniellecrobins
Web centralization
It’s easier to manage and monetize a silo
Image courtesy of Beaker Browser
“We embed values into our technology whether we are aware
of it or not”- Stephen Whitmore (@noffle)
Digital Democracy
See also the work of Safiya Noble
@daniellecrobinshttps://blog.datproject.org/2018/03/05/css-community-call-03-2018/
@daniellecrobins
In the centralized web
We trust the server to locate, not change objects
Silos are the natural state
Data may be in multiple silos
@daniellecrobins
Today’s web relies upon
URLs to identify location of objects
Ability to change information without changing location
Aggregating content for discovery
@daniellecrobins
Today’s web lacks
Persistent identifiers
Transparent change log
Links between silos
@daniellecrobins
“The internet is a terribly unstable way to keep information available”
- Laurie AllenPenn Libraries' Assistant Director for Digital Scholarship
@daniellecrobins
“Federal data ≅ website”https://www1.ncdc.noaa.gov/pub/data/
@daniellecrobins
Why are federal data ≅ webpages?
To find an object online:
1. Discover the link2. Link still works
3. Trust the info at the link
https://www.slideshare.net/shefw/save-the-data-the-role-of-librarians-in-datarescue-collaborations
@daniellecrobins
Why are federal data ≅ webpages?
https://www1.ncdc.noaa.gov/pub/data/annualreports
https://www.slideshare.net/shefw/save-the-data-the-role-of-librarians-in-datarescue-collaborations
@daniellecrobinsM. Klein, several papers and talks, links at end
Link rot: When links fail
Content Drift: When referenced content are changed
Link rot + content drift = Reference rot
@daniellecrobins
The Internet is broken
and we are using itto access and distributeall of human knowledge
¯\_(ツ)_/¯
@daniellecrobinsits all about Rock (:
The web is being reimagined
1. Data on the web
2. A new model of data stewardship
3. Prototyping decentralized preservation
4. Reimagine data on the web
@daniellecrobins
@daniellecrobins
Preservation starts here
“Sharing research data is not well understood, incentivized,
or accessible”
Daniella Lowenberg Research Data Specialist
Product Manager of @uc3dashCalifornia Digital Library
@daniellecrobinshttps://medium.com/@UC3CDL/we-are-talking-loudly-and-no-one-is-listening-a108248693f7 / csv
screenshot from https://peerj.com/preprints/2588/
and preserving
^
Preservation requires custody@daniellecrobinsseagen
Centralized model requirescustody to provide access
@daniellecrobinsImage courtesy of Beaker Browser
@daniellecrobins
Web accessible objectsVia Agency
Is custody required?@daniellecrobins#WOCinTech Chat
“Preservation in place… Bring preservation services
to the content”
-Stephen AbramsPreservation without Possession
California Digital Library
@daniellecrobinshttps://figshare.com/articles/Preservation_without_possession_Content-
addressable_identifiers_for_post-custodial_preservation/5844369
Sharing data and costs@daniellecrobins
Cooperative of trusted entities
Image courtesy of Beaker Browser
@daniellecrobinsSangyaPundir / www.force11.org/group/fairgroup/fairprinciples
@daniellecrobinswww.force11.org/group/fairgroup/fairprinciples
Leverage existing infrastructure
1. Data on the web
2. A new model of data stewardship
3. Prototyping decentralized preservation
4. Reimagine data on the web
@daniellecrobins
@daniellecrobins
Multiple decentralized approaches
BTC Keychain / Danilo / http://www.ala.org/tools/future/trends/blockchain /
https://gist.github.com/mafintosh/bd9e6d350ebf02441c9707c5f799d05b
Blockchain Peer-to-peer
Data stored at central location, accessed by independent users
@daniellecrobinsImage courtesy of Beaker Browser
Centralized “hub and spoke” model
Data persistently identified, networked ability to scale
@daniellecrobins
Decentralized models
Image courtesy of Beaker Browser
@daniellecrobins
Peer-to-peer public technology
https://github.com/mafintosh/bws-2017
@daniellecrobins
What’s Dat?
Persistent identifiers
+
Network of peers
https://github.com/datproject/docs/blob/master/papers/dat-paper.pdf
@daniellecrobins
Dat + scholarly data =
- Automate preservation, versioning
- Find data across storage locations
- Spread cost burden across network
- Foundational links between silos
@daniellecrobins
It’s all about TRUST
Image courtesy of Beaker Browser
@daniellecrobins
… and I trust LIBRARIES
Image courtesy of Beaker Browser
@daniellecrobinsDr. Dannise V. Ruiz-Ramos describes sea star genome annotation pipeline
Start with data creation
Dat in the Lab lessons:
Leverage existing workflows
Automate data versioning, preservation
Link researchers to library
Now linking libraries to each other
@daniellecrobinshttps://blog.datproject.org/tag/science/
Prototype: CDL - IA - SDSC
CDL’s DASH corpus (<5 TB)
Copied to IA and SDSC
Deal with technical hurdles (S3)
Next: Monitoring dynamic information
@daniellecrobins
Every institution contributes
Storage, bandwidth
Metadata on their collection
Commitment to preserve their collection
to the network
@daniellecrobins
Any user can access
Information on library collections
History of objects
Whole or partial data sets
from the network
@daniellecrobins
1. Data on the web
2. A new model of data stewardship
3. Prototyping decentralized preservation
4. Reimagine data on the web
@daniellecrobins
Discussion:
● What are the data types that your organization is responsible for?
● How are those data created, stored, used? When do they come to you?
● Who interacts with data? How do they interact with it?
● How are equity, justice addressed (or not) in data stewardship plans?
● What are your concerns around long term preservation of data?
https://library.auraria.edu/d2pproject/about
The Data to Policy Project (D2P) is an initiative to engage students with their community’s
needs through course-based assignments, which culminate into data-driven policy
proposals to local governments and agencies.
Cool project alert!
Thank you to the Western States Government Information
Conference Planning Committee
DANIELLE ROBINSON, PhDCo-Executive Director at Code for Science & Society
@daniellecrobins @codeforsociety
Discussion:
● What are the data types that your organization is responsible for?
● How are those data created, stored, used? When do they come to you?
● Who interacts with data? How do they interact with it?
● How are equity, justice addressed (or not) in data stewardship plans?
● What are your concerns around long term preservation of data?