managing sensitive data in your repository

Post on 15-Apr-2017

174 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Managing sensitive data in your repository

Natasha SimonsSharing Health-y and Sensitive Data: Challenges and Solutions Workshop Perth 3 September 2015

What is a data repository?

1

A research data repository is a managed environment capable of

storing and sharing (largely) digital data. The data repository supports the process of curating, preserving, and sharing research

data.

What kinds of data repositories are there?

2

Are repositories for open data only?

3

Yes and no….because it depends on the purpose/scope

Repositories can support data that is:1. Open access only2. Mediated access only3. Closed/private only

Most data repositories are a combination of 1 & 2

Are there health data repositories?

4

Yes, many!

http://www.nlm.nih.gov/NIHbmic/nih_data_sharing_repositories.html

What’s the point of data repositories?

5

Data repositories assist researchers and the research community to:

1. Support data sharing, data discovery & reuse, data preservation

2. Comply with publisher requirements3. Comply with funder requirements4. Comply with institutional or govt policy

requirements5. Support institutional goals Illustration credit: Ainsley Seago. doi:10.1371/journal.pbio.1001779.g001

Can sensitive data be managed in a repository?

6

Yes!

Ask:• Can the raw data be (de-identified and)

made completely open? Or will access be restricted? Mediated?

• What licence should be applied to enable data reuse?

• What metadata elements, links (e.g. to publications) and identifiers (e.g. DOIs, ORCIDs) will aid discovery and reuse of the data? Source: http://www.slideshare.net/WLSA_ORG/wh2014-workshop-health-data-consortium

Can sensitive data be managed in a repository?

7

Also ask:

• Can a citation element be added to support attribution and reuse tracking?

• Who/what will be the method of contact for the data?

• Are there other conditions that the data is subject to e.g. release subject to an embargo period?

What’s really challenging?

14

“Having longitudinal data on individuals is a part of many observational designs, and is needed for research into outcomes, efficacy and many mechanistic studies. Most repositories thus have longitudinal observations. To build such a database you need some way to link observations on the same identified person. Therefore most repositories contain personally identified data, but, because of privacy concerns, they often release only de-identified data. Difficulties in the de-identification process can cause some data to be omitted in a dataset. A lack of direct identifiers in a data collection or federation could prevent linking of data for some patients.

From: Wade, T. Traits and Types of Health Data Repositories. Health Information Science and Systems 2014, 2:4 doi:10.1186/2047-2501-2-4http://www.hissjournal.com/content/2/1/4

Small group exercise

15

Discovering sensitive health data in repositories

Small group exercise

Acknowledgement

Australian National Data Service is funded by

the Commonwealth under the NCRIS Program

31 August, 2015 16

top related