beyond preservation: situating archaeological data in professional practice

Post on 06-Jul-2015

42 Views

Category:

Education

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

I presented this lecture at the German Archaeological Institute (DAI) in Berlin on Nov. 6, 2014 (see: http://www.dainst.org/termin/-/event-display/ogNX4Gtxkd87/342513) The lecture focuses on how archaeological data fits in professional practice. It looks at scholarly communications, government policies toward the sciences and humanities, and professional reward structures. The lecture then shows examples of how Open Context publishes archeological data, including editorial processes to promote data quality and relate contributed data to the 'Web of Data' using Linked Open Data methods. Research applications of Open Context and linked archaeological data include the Digital Index of North American Archaeology (DINAA) project (see: http://ux.opencontext.org/blog/archaeology-site-data/) and a data integration study exploring the development and dispersal of animal husbandry economies in Epipaleolithic - Chalcolithic Anatolia (see: http://dx.doi.org/10.1371/journal.pone.0099845) The lecture concludes with how archaeologists need to invest more intellectually in the method and theory of modeling and creating data. It also looks at how concepts and expectations of publishing static artifacts need to be revised (using techniques like version control) to enable continued and more transparent revision of data to fix problems, implement new standards, and meet new research goals.

TRANSCRIPT

Eric C. Kansa (@ekansa)UC Berkeley D-Lab

& Open Context

2014-2015 Harvard Center for Hellenic Studies & German

Archaeological Institute Research Fellow

Beyond Preservation: Situating Archaeological Data in

Professional Practice

Unless otherwise indicated, this work is licensed under a Creative Commons Attribution 3.0 License <http://creativecommons.org/licenses/by/3.0/>

Eric C. Kansa (@ekansa)UC Berkeley D-Lab

& Open Context

2014-2015 Harvard Center for Hellenic Studies & German

Archaeological Institute Research Fellow

Data Sharing as Publication• Started in 2007• Open data (mainly CC-By)• Archiving by California Digital

Library• Part of a broader reform

movement in scholarly communications

Data Sharing as Publication• Started in 2007• Open data (mainly CC-By)• Archiving by California Digital

Library• Part of a broader reform

movement in scholarly communications

IntroductionIntroduction

Visions for Digital Data in Archaeology1. “Optimizing the status quo”2. Opportunity for fundamentally better

ways to conduct and communicate research

Visions for Digital Data in Archaeology1. “Optimizing the status quo”2. Opportunity for fundamentally better

ways to conduct and communicate research

IntroductionIntroduction

Digital Data in Archaeology1. Why discuss data?2. Data in (bad) institutional contexts3. Open Context's approach4. Need for more & wider intellectual

investment

Digital Data in Archaeology1. Why discuss data?2. Data in (bad) institutional contexts3. Open Context's approach4. Need for more & wider intellectual

investment

IntroductionIntroduction

Digital Data in Archaeology1. Why discuss data?2. Data in (bad) institutional contexts3. Open Context's approach4. Need for more & wider intellectual

investment

Digital Data in Archaeology1. Why discuss data?2. Data in (bad) institutional contexts3. Open Context's approach4. Need for more & wider intellectual

investment

Data source: Arif Jinha (2010). Article 50 million: an estimate of the number of scholarly articles in existence Learned Publishing, 23 (3), 258-263 DOI: 10.1087/20100308.

Image Source: http://www.cs.cmu.edu/~comar/open-science/

Data source: Arif Jinha (2010). Article 50 million: an estimate of the number of scholarly articles in existence Learned Publishing, 23 (3), 258-263 DOI: 10.1087/20100308.

Image Source: http://www.cs.cmu.edu/~comar/open-science/

Paper and paper like digital files (PDFs) do not scale well:● Discovery● Reuse

Paper and paper like digital files (PDFs) do not scale well:● Discovery● Reuse

Image Credit: Wikimedia Commons (Public Domain) http://commons.wikimedia.org/wiki/File:Archives_entreprises.jpg

Image Credit: Wikimedia Commons (CC-BY-SA) http://commons.wikimedia.org/wiki/File:BigData_2267x1146_white.png

Lots of investment in “Big Data”● Corporate● Government● 'STEM' academia

Lots of investment in “Big Data”● Corporate● Government● 'STEM' academia

Lots of investment in “Big Data”● Corporate● Government● 'STEM' academia

Lots of investment in “Big Data”● Corporate● Government● 'STEM' academia

Image Credit: 'gin soak' (CC-BY-NC-ND) https://www.flickr.com/photos/gin_soak/2215398726

Structured Data – Creativity1. New forms of communication2. New forms of collaboration3. New research opportunities

Structured Data – Creativity1. New forms of communication2. New forms of collaboration3. New research opportunities

'Mash-ups' (informal

integrations)Open Context &

Arachne

'Mash-ups' (informal

integrations)Open Context &

Arachne

Experiment in open, distributedpost-publication peer-review

Experiment in open, distributedpost-publication peer-review

Text-mining literature to identify references to ancient places

Text-mining literature to identify references to ancient places

2010 (renewed 2012) Google Digital Humanities Awards: with Elton Barker, Leif Isaksen, Kate Byrne, Nick Rabinowitz2010 (renewed 2012) Google Digital Humanities Awards: with Elton Barker, Leif Isaksen, Kate Byrne, Nick Rabinowitz

Project limited to public domain (pre-1920) resources

Project limited to public domain (pre-1920) resources

IntroductionIntroduction

Digital Data in Archaeology1. Why discuss data?2. Data in (bad) institutional contexts3. Open Context's approach4. Need for more & wider intellectual

investment

Digital Data in Archaeology1. Why discuss data?2. Data in (bad) institutional contexts3. Open Context's approach4. Need for more & wider intellectual

investment

Commercial interests and public policy

Conditions of academic labor

Neoliberalism: (Loosely associated ideologies /

assumptions / interests)

Source: The Occasional Pamphlet - Harvard University (http://blogs.law.harvard.edu/pamphlet/2013/01/29/why-open-access-is-better-for-scholarly-societies/)

Conditions of academic labor

Neoliberalism: (Loosely associated ideologies /

assumptions / interests)

Neoliberalism:Taylorism,

“Audit Culture” and fierce job/grant competition

Data contributions don’t

count!

Image Credit: Wikimedia Commons (Public Domain) http://en.wikipedia.org/wiki/Frederick_Winslow_Taylor#mediaviewer/File:Frederick_Winslow_Taylor_crop.jpg

Ironies of data: Publications counted as data, but data don’t

count!

☹Frowns at

Many researchers (esp. junior scholars) lack academic freedom

My Precious Data

Image Credit: “Lord of the Rings” (2003, New Line), All Rights Reserved Copyright

Data sharing as compliance

Need more carrots!1. Citation, credit, intellectually

valued2. Research outcomes (new

insights from data reuse!)

Need more carrots!1. Citation, credit, intellectually

valued2. Research outcomes (new

insights from data reuse!)

Need more carrots!1. Citation, credit, intellectually

valued2. Research outcomes (new

insights from data reuse!)

Need more carrots!1. Citation, credit, intellectually

valued2. Research outcomes (new

insights from data reuse!)

Adapt Academic Taylorism:● Datacite (metadata, citation

for datasets)● Alt-metrics (social media,

view counts, download counts, etc.)

Make data count!

Need more carrots!1. Citation, credit, intellectually

valued2. Research outcomes (new

insights from data reuse!)

Need more carrots!1. Citation, credit, intellectually

valued2. Research outcomes (new

insights from data reuse!)

IntroductionIntroduction

Digital Data in Archaeology1. Why discuss data?2. Data in (bad) institutional contexts3. Open Context's approach4. Need for more & wider intellectual

investment

Digital Data in Archaeology1. Why discuss data?2. Data in (bad) institutional contexts3. Open Context's approach4. Need for more & wider intellectual

investment

Data Sharing as Publication• Started in 2007• Open data (mainly CC-By)• Archiving by California Digital

Library• Part of a broader reform

movement in scholarly communications

Data Sharing as Publication• Started in 2007• Open data (mainly CC-By)• Archiving by California Digital

Library• Part of a broader reform

movement in scholarly communications

Publishing Workflow

Improve / Enhance1. Consistency2. Context (intelligibility,

interoperability)

Improve / Enhance1. Consistency2. Context (intelligibility,

interoperability)

Digital Index of North American Archaeology (DINAA)1. Rich metadata (cultures,

chronology, site-types)2. Reduced precision location data

(site security, legal)3. Data modeling challenges (using

GeoJSON-LD, CIDOC-CRM, event models)

Digital Index of North American Archaeology (DINAA)1. Rich metadata (cultures,

chronology, site-types)2. Reduced precision location data

(site security, legal)3. Data modeling challenges (using

GeoJSON-LD, CIDOC-CRM, event models)

Using site file data to

examine the impacts of sea

level rise

In 100 years, 19,676 sites will be covered!

Digital Index of North American Archaeology (DINAA)1. ~ 500,000 site records curated by

state officials2. Key (Linked Data!) reference for N.

American archaeology3. PIs/Co-PIs: David G. Anderson,

Joshua Wells, Eric Kansa, Sarah Kansa, Stephen Yerka

Digital Index of North American Archaeology (DINAA)1. ~ 500,000 site records curated by

state officials2. Key (Linked Data!) reference for N.

American archaeology3. PIs/Co-PIs: David G. Anderson,

Joshua Wells, Eric Kansa, Sarah Kansa, Stephen Yerka

Stable Web URI:Reference this to disambiguate between “Alexandria” (Egypt) and other places called “Alexandria” (many of which are also ancient)

Stable Web URI:Reference this to disambiguate between “Alexandria” (Egypt) and other places called “Alexandria” (many of which are also ancient)

Pelagios:Heat map of museum collections, archives, databases referencing places in Pleiades (PIs Leif Isaksen, Elton Barker)

Pelagios:Heat map of museum collections, archives, databases referencing places in Pleiades (PIs Leif Isaksen, Elton Barker)

Web of Data (2011)Web of Data (2011)

Need Archaeology on the Map

Contributions should not be isolated from other communities

Linked Data:Annotations to community vocabularies part of Open Context editorial process

Linked Data:Annotations to community vocabularies part of Open Context editorial process

IntroductionIntroduction

Digital Data in Archaeology1. Why discuss data?2. Data in (bad) institutional contexts3. Open Context's approach4. Need for more & wider intellectual

investment

Digital Data in Archaeology1. Why discuss data?2. Data in (bad) institutional contexts3. Open Context's approach4. Need for more & wider intellectual

investment

I just started using an Excel spreadsheet that has sort of slowly gotten bigger and bigger over time with more variables or columns…I've added …color coding…I also use…a very sort of primitive numerical coding system, again, that I inherited from my research advisers…So, this little book that goes with me of codes which is sort of odd, but …we all know that a 14 is a sheep.” (CCU13)

Need to do more than “Optimize the Status Quo”Need to do more than “Optimize the Status Quo”

Raw Data Can Be UnappetizingRaw Data Can Be Unappetizing

Sometimes data is better served cooked

Large scale data sharing & integration for exploring the origins of farming. Funded by EOL / NEH

Large scale data sharing & integration for exploring the origins of farming. Funded by EOL / NEH

1. 300,000 bone specimens2. Complex: dozens, up to 110

descriptive fields3. 34 contributors from 15

archaeological sites4. More than 4 person years of

effort to create the data !

1. 300,000 bone specimens2. Complex: dozens, up to 110

descriptive fields3. 34 contributors from 15

archaeological sites4. More than 4 person years of

effort to create the data !

7000 BC (many pigs, cattle)

7500 BC (sheep + goat dominate, few pigs, few cattle)

6500 BC (few pigs, mixing with wild animals?)

8000 BC (cattle, pigs,sheep + goats)

• Not a neat model of progress to adopt a more productive economy. Very different, sometimes piecemeal adoption in different regions.

Arbuckle BS, Kansa SW, Kansa E, Orton D, Çakırlar C, et al. (2014) Data Sharing Reveals Complexity in the Westward Spread of Domestic Animals across Neolithic Turkey. PLoS ONE 9(6): e99845. doi:10.1371/journal.pone.0099845

Easy to Align1. Animal taxonomy2. Skeletal elements3. Sex determinations4. Side of the animal5. Fusion (bone growth, up to a

point)

Easy to Align1. Animal taxonomy2. Skeletal elements3. Sex determinations4. Side of the animal5. Fusion (bone growth, up to a

point)

Hard to Align (poor modeling, recording)1. Tooth wear (age)2. Fusion data3. Measurements

Despite common research methods!!

Hard to Align (poor modeling, recording)1. Tooth wear (age)2. Fusion data3. Measurements

Despite common research methods!!

“Under the hood” exposure and reuse attempts critical! Fundamental method & theory issues in data modeling!

Investing in Data is a Continual Need1. Data and code co-evolve. New

visualizations, analysis may reveal unseen problems in data.

2. Data and metadata change routinely (revised stratigraphy requires ongoing updates to data in this analysis)

3. Problems, interpretive issues in data (and annotations) keep cropping up.

4. Is publishing a bad metaphor implying a static product?

Investing in Data is a Continual Need1. Data and code co-evolve. New

visualizations, analysis may reveal unseen problems in data.

2. Data and metadata change routinely (revised stratigraphy requires ongoing updates to data in this analysis)

3. Problems, interpretive issues in data (and annotations) keep cropping up.

4. Is publishing a bad metaphor implying a static product?

Data sharing as publication

Data sharing as open source release cycles?

Data sharing as publication

Data sharing as open source release cycles?

Data sharing as publication

Data sharing as open source release cycles?

Data sharing as publication

Data sharing as open source release cycles?

Data sharing as publicationAND

Data sharing as open source release cycles

Data sharing as publicationAND

Data sharing as open source release cycles

Go beyond Optimization of the Status Quo

Go beyond Optimization of the Status Quo

More to data than 'compliance'

Data require intellectual investment, methodological and theoretical innovation.

New professional roles needed, but who will pay for it?

More to data than 'compliance'

Data require intellectual investment, methodological and theoretical innovation.

New professional roles needed, but who will pay for it?

Thank you!Thank you!

Special Thanks!

Harvard Center for Hellenic Studies & the German Archaeological Institute (DAI)

top related