life cycle models & principles jake carlson associate professor of library science data services...
TRANSCRIPT
Life Cycle Models & PrinciplesJake Carlson
Associate Professor of Library ScienceData Services Specialist
Purdue University Libraries
What will be Covered
• An introduction to terms and concepts relating to data lifecycles.
• An understanding of the purpose of lifecycle models.
• Coverage of some life cycle models and principles how they may relate to each other.
• An introduction to ICPSR’s lifecycle model, as a loose framework for this workshop.
Data Science
• “Data science enables the creation of data products.”
• “We're increasingly finding data in the wild, and data scientists are involved with gathering data, massaging it into a tractable form, making it tell its story, and presenting that story to others.”
– Loukides, M. (2011) What is Data Science? http://radar.oreilly.com/2010/06/what-is-data-science.html
Data Curation
• “…the active and on-going management of data through its lifecycle of interest and usefulness to scholarly and educational activities.” - UIUC GSLIS http://cirss.lis.illinois.edu/CollMeta/dcep.html
• “… the value-added activities and features that stewards of content engage in to make the content useful.” - Nancy McGovern, ICPSR
“…the active and on-going management of data through its lifecycle of interest and usefulness to scholarly and educational activities.” - UIUC GSLIS http://cirss.lis.illinois.edu/CollMeta/dcep.html
“… the value-added activities and features that stewards of content engage in to make the content useful.” - Nancy McGovern, ICPSR
What is a Lifecycle?The continuous sequence of changes undergone by an organism from one primary form, as a gamete, to the development of the same form again. http://www.dictionary.com
Graphic: http://insected.arizona.edu/manduca/Mand_cycle.html
Data Lifecycles
Primer on Data Management
http://www.dataone.org/sites/all/documents/DataONE_BP_Primer_020212.pdf
Why Use Life Cycle Models?
• Helps define and explain complex processes (graphically).
• Help to identify important components, roles, responsibilities, milestones, etc.
• Demonstrate connections and relationships between parts and the whole.
• Provide a framework to develop services and support.
Limitations of Lifecycle Models
• “All models are wrong, but some are useful”George E.P. Box, Statistician, 1976
–Models generally reflect the interests, perspectives (and biases) of the agencies that created them. –Models mask complexity.–Models tend to overlook heterogeneity / diversity.–Models are often presented as orderly and linear.–Models depict the ideal.
Aspects of Lifecycle Models
• Subject Based– Scholarly Communication– Research– Data– Curation
• Source Based– Individual– Organizational– Community
Scholarly Communication Lifecycles
Scholarly Communication Lifecycles
Gettysburg College Library
Graphic: http://www.gettysburg.edu/library/research/guides/scientific_information/index.dot
Research Lifecycles
Loughborough University Library (UK)Graphic: http://www.lboro.ac.uk/services/library/research/
Scholarly Communication Lifecycles
Microsoft ResearchGraphic: http://research.microsoft.com/en-us/news/features/zentity-052009.aspx
Research Lifecycle: Project
The Research360 Project will develop technical and human infrastructure for research data management at the University of Bath…
Focus in particular on issues and challenges that arise from private sector partnerships and research collaborations;
http://blogs.bath.ac.uk/research360/about/
Research Lifecycles: Specialized
Cross-Cultural Surveys
Institute of Social
Research Graphic: http://ccsg.isr.umich.edu/intro.cfm
Research Lifecycle: Funding
Wayne State University, Division of ResearchGraphic: http://spa.wayne.edu/grant/
Connecting Research & Data Lifecycles
“How JISC is Helping Researchers”http://www.jisc.ac.uk/whatwedo/campaigns/res3/jischelp.aspx
Data Lifecycles
Chuck Humphrey (2006) “e-Science and the lifecycles of Researchhttp://datalib.library.ualberta.ca/~humphrey/lifecycle-science060308.doc
A Data Curation Profile contains:
Information about an individual data set, including it’s data lifecycle.
Current management practice.
Unmet needs.
http://datacurationprofiles.org
Individual Data Lifecycles are Unique
Individual Data Lifecycles can be Complex
Data Lifecycle Model: UVAData Mining
Data Curation & Preservation
Publication Rights & Restrictions
DMP Consulting
Grant Writing & Planning
DM Planning
Metadata & Documentation
Data ProcessingHPC/VisualizationTool Development
Data Storage
Data Search
Image: University of Virginia Libraries Scientific Data Consulting Group: http://dmconsult.library.virginia.edu/
Data Lifecycle Model for ICPSR
1. Proposal and Planning
2. Project Start Up
3. Data Collection
4. Data Analysis
5. Preparing Data for Sharing
6. Deposit
ICPSR’s Guide to Social Science Data Preparation and Archiving:
http://www.icpsr.umich.edu/icpsrweb/content/deposit/guide/
Common Elements in Data Lifecycle
• Collect / Generate• Process• Analyze• Finalize / Summarize for Publication
Curation Lifecycle
Neil Beagrie (2004) “The Continuing Access and Digital Preservation Strategy for the UK Joint Information Systems Committee (JISC)” D-Lib Magazine.http://www.dlib.org/dlib/july04/beagrie/07beagrie.html
Curation Lifecycle: DCC
http://www.dcc.ac.uk/resources/curation-lifecycle-model
OAIS Reference Model: Preservation
ICPSR Pipeline Process
http://staging.icpsr.umich.edu/icpsralpha/content/datamanagement/lifecycle/oais.html
Deposit
Inputs – Materials to Deposit:• Data• Documentation • Data Form (Description)
Outputs – SIP:• Deposited Files • Metadata from the
Deposit• Signed Deposit Form
Ingest
Actions:• Processing Plan• Assign a Study Number• Formatting for Access
and Preservation
Outputs – AIP: • Data• Documentation• Set Up Files• Processing History
Archival Storage
Actions: •Migrations •Checking integrity - checksums •Making, storing and synching redundant copies at various locations
Outputs – Curated AIP
Data Management
Actions:•Populating, •Maintaining,•Making the descriptive information accessible
Outputs:•Compliant Metadata
Access
Actions:•Data set is indexed, searchable and made available. Outcome – DIP:•Data and document files•Bibliography file•Study description file•Terms of use file•File Manifest
Common Elements in Curation Lifecycle
• Deposit / Ingest• Storage• Document / Describe• Discover / Access / Use• Manage• Preserve
Lifecycle Models & Data Services
• Need for developing your organizational model – based on community models and informed by individual lifecycles.
• Need for alignment between data lifecycles and curation lifecycles – informed by research and scholarly communication lifecycles
Alignment Between Lifecycles
Proposal Development &
DMP
Project Start-up
Data Collection & File Creation Data
Analysis Preparing Data for Sharing
Ingest
Data Mgmt
Archival
Access
Research
Scholarly CommunicationAccess
Storage
Ingest Storage
Archival Storage
Example of Lifecycle Alignment
Image: Green, Ann G., and Myron P. Gutmann. (2007). “Building Partnerships Among Social Science Researchers, Institution-based Repositories, and Domain Specific Data Archives.” OCLC Systems and Services: International Digital Library Perspectives, 23: 35-53.
Life Cycle Models & Principles
Jake CarlsonAssociate Professor of Library Science
Data Services Specialist Purdue University Libraries