relu call pi meeting, 12 october, 2005 relu data support service relu-dss louise corti, uk data...

76
RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

Upload: nicholas-mckay

Post on 28-Mar-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

RELU Data Support Service RELU-DSS

Louise Corti, UK Data Archive

Page 2: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

Themes and data

• RELU themes:– A The Integration of Land and Water Use – B The Environmental Basis of Rural Development – C Sustainable Food Chains– D Economic and Social Interactions with the Rural

Environment

• Programme is both using and creating a variety of data sources

• disparate types of data – social and environmental and biological data

• estimate some 80 datasets from Call 1 (8 major research projects and smaller scoping studies)

Page 3: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

RELU data types

• Social data – people based– Micro (survey)

• Household or individual level attributes• Behaviour, attitudes and options

• Business/company– Farm level data– Aggregated

• UK Census e.g. small area statistics)• Retail statistics• health indicators

• GIS/Spatial data geographically referenced environmental databases– Ordnance survey– Road networks– Settlement

Page 4: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

RELU data types continued

• water quality, land fill, air quality, emission levels

• soil data, eg mineral composition

• ecological data, animal and bird distributions

• agricultural census

• climate and meteorological data

• river flow data

• biochemical data relating to foods/habitats

Page 5: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

RELU Data Support Service

• set up to help oversee and implement the Programme's Data Management Policy and Data Management Plan

• provides a support service for RELU researchers and staff to gain information and guidance on issues surrounding longer-term data sharing and preservation

• joint support service run by:– ESRC/JISC supported UK Data Archive at Essex (UKDA)– The NERC-supported Centre for Ecology & Hydrology

(CEH)

• funded initially for one year supporting one FTE and outreach activities: 1 Jan 05 – 31 Dec 05. Continuation at some level expected

Page 6: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

RELU Data Management Policy

www.relu.ac.uk/about/data.htm

• builds on existing ESRC and NERC mandatory data policies

• enhances the capabilities for interdisciplinarity and thus improves the ability of the research community to:

– apply learning from one field to another

– combine different methodological approaches and sources of information

– cross-fertilise ideas and concepts

– understand scientific, technological and environmental problems in their social and economic contexts

Page 7: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

Policy principles

• publicly funded research data are a valuable, long term resource

• to ensure maximum research exploitation data must be managed effectively from day-1

• researchers must collect data in such a way as to ensure longer term sharing

• and manage their data effectively during the life of a project

• RELU funds will support data management through the life of the project

• data must be made available by researchers for archiving: ESRC and NERC supported data centres provide long-term, post-project data management

Page 8: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

Longer-term data sharing

• data centres /archives make (selected) data created available to other bona fide researchers

• safeguards to protect the interests of the original collector, who may retain Intellectual Property Rights

• preserve data using up-to-date curation systems and keep apace with technology and data trends

• provide support resource discovery and user support services

• provide access to ‘enhanced’ data, e.g combined, exemplars etc.

Page 9: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

RELU DSS Activities I

• set up a data management advisory and support service for Call 1&2 award holders and Call 2&3 applicants and successful award holders:

– 1 FTE - 8 staff in place – ESDS and CEH– web presence established http://relu.esds.ac.uk/– external email list established [email protected] – regular DSS staff meetings held, CROSS SITES– regular communication with Programme Director’s Office

ongoing

• provide guidance to the PMG and data sub-group on data management issues and longer-term costing for ongoing support of RELU projects’ data management and ultimate archiving

Page 10: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

RELU DSS Web pages

• Screen dump

Page 11: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

RELU DSS Activities II

• provide a web-based information portal that will provide expert guidance on data management issues and searchable structured information about RELU data:

– guidance on key issues in data management– draft booklet on Guidance on Data Management

circulated– web pages and brochure– metadatabase about data being created within

programme

• embarked on a limited programme of outreach and training aimed at RELU award holders:

Page 12: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

RELU DSS Activities III

• identifying and cataloguing with the intention of facilitating access to key external data sources for RELU projects, where required

– clear need to provide structured information and pointers to third party data sources

– compiled spreadsheet of some 400 sources that have been mentioned/requested by REL researchers

– adding contact information, costs, licensing restrictions etc.

– RELU Programme Office may help to facilitate programme-level or ideally, longer-term community access

Page 13: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

Existing 3rd party datasets

• Research Council data centres – NERC centres e.g CEH (land cover and HOST)– Rothmansted (BBSRC experimental samples of crops

and soils)– Economic and social data service (eg ESRC Health and

Lifestyle survey)– EDINA/UK Borders (boundary data for admin areas)

• Public/Private Research institutes – Macaulay

• soils and derived; climate; land cover; land capability data

• Department for Environment, Food and Rural Affairs (DEFRA) eg Farm Business survey

• Scottish Executive Environment and Rural Affairs Department (SEERAD)

• Environment Agency (EA)• National Soil Research Institute• Met Office

Page 14: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

DATA CENTRES

Page 15: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

ESRC/JISC Economic and Social Data Service

Data for research and teaching purposes and used in all sectors and for many different disciplines

• official agencies - mainly central government

• individual academics - research grants

• market research agencies

• public records/historical sources

• links to UK census data

• qualitative and quantitative

• international statistical time series

• access to international data via

links with other data archives worldwide

• history data service in-house (AHDS)

• 4,000+ datasets

in the collection

• 200+ new

datasets are

added each year

• 6,500+ orders for

data per year

• 18,000+ datasets

distributed

worldwide pa

Page 16: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

ESRC/JISC Data Centre• national data archiving and dissemination

service, running from 1 Jan. 2003

www.esds.ac.uk

• jointly supported by: – Economic and Social Research Council – Joint Information Systems Committee

• partners:– UK Data Archive (UKDA), Essex – Manchester Information and Associated – Services (MIMAS), Manchester– Cathie Marsh Centre for Census and Survey

Research (CCSR), Manchester – Institute of Social and Economic Research (ISER),

Essex

Page 17: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

ESDS Holdings

Data for research and teaching purposes and used in all sectors and for many different disciplines

• official agencies - mainly central government

• individual academics - research grants

• market research agencies

• public records/historical sources

• links to UK census data

• qualitative and quantitative

• international statistical time series

• access to international data via

links with other data archives worldwide

• history data service in-house (AHDS)

• 4,000+ datasets

in the collection

• 200+ new

datasets are

added each year

• 6,500+ orders for

data per year

• 18,000+ datasets

distributed

worldwide pa

Page 18: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

The large-scale government surveys

• General Household Survey• Labour Force Survey• Health Survey for England/Wales/Scotland • Family Expenditure Survey• British Crime Survey• Family Resources Survey • National Food Survey/Expenditure and Food Survey • ONS Omnibus Survey • Survey of English Housing • British Social Attitudes• National Travel Survey• Time Use Survey

Page 19: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

Benefits of the large-scale government datasets

• good quality data– produced by experienced research organisations– usually nationally representative with large

samples– good response rates– very well documented

• continuous data– allows comparison over time– data is largely cross-sectional

• hierarchical data– individual and household– intra-household differences– household effects on individuals

0

5

10

15

20

25

30

1979 1985 1989 1991 1993 1995 1998 2000

Percentage of women aged 18-49 cohabiting

General Household Survey

Page 20: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

Search on ‘Environmental’

200+ datasets

found

Page 21: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

Types of qualitative data

• diverse data types: in-depth interviews; semi-structured interviews; focus groups; oral histories; mixed methods data; open-ended survey questions; case notes/records of meetings; diaries/research diaries

• multimedia: audio, video, photos and text (most common is interview transcriptions)

• formats: digital, paper, analogue audio-visual

• data structures - differ across different ‘document types’

Page 22: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

International data providers

• International Monetary Fund • OECD • United Nations• World Bank • Eurostat• International Labour

Organisation• UK Office for National

Statistics

• freely available to UK HE/FE – data licensing costs are paid by ESRC

• datasets delivered over the web via Beyond 20/20

Databanks cover:

• economic performance and development

• trade, industry and markets

• employment• demography, migration

and health• governance• human development • social expenditure• education• science and technology • land use and the

environment

Page 23: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

ESDS: Online access to data and user guides

• web pages – easy to navigate format– web catalogue with variable level searching– subject browsing and major series – free web access to online doc - pdf user guides and forms

• registration– one-off registration with userid/password– online account management and “Shopping Basket” ordering– data are freely available for the majority of users– One-stop Athens authentication

• data download and online browsing – web download in various software formats - SPSS, STATA, tab-delimited,

word – Nesstar – online data analysis and visualisation– ESDS International online system– ESDS Qualidata online browsing system

Page 24: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

NERC Data Centres

• NERC’s data holdings – core asset

• Network of 7 Designated Data Centres who are responsible for managing NERC funded data and implementation of the NERC Data Policy data centres

• Central directory – the NERC metadata gateway

• E-Science funded NERC Data Grid under development

Page 25: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

NERC Designated Data Centres

• Antarctic Environmental Data Centre: Responsible for all NERC's data from the Antarctic, regardless of discipline

• British Atmospheric Data Centre: Responsible for atmospheric sciences data

• British Oceanographic Data Centre: Responsible for marine data

• National Geosciences Information Service: Responsible for geosciences data

• National Water Archive: Responsible for NERC's hydrological data and for the Government's National River Flow Archive

• Environmental Information Centre: Responsible for all other NERC terrestrial and freshwater data

• NERC Earth Observation Data Centre: Responsible for NERC’s non-discipline-related remotely sensed data of the surface of the Earth acquired by satellite and airborne sensors

Page 26: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

NERC Data Centre Holdings

• The NERC MetaData Gateway simultaneously searches the catalogues of data held at several of the NERC designated data centres.

Page 27: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

QA and Data Management Plans

Page 28: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

Data Management Plan

• proforma to complete (Section 3 of the Project Communication and Data Management Plan)

• highlighting data management and custody issues at an early stage

• providing a basis for quality assurance within the Programme

• providing a basis from which award holders and the Programme Director can report and monitor project and overall RELU Programme progress

Page 29: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

Information required from plan

• requirements for access to existing datasets

• details of new and derived datasets to be produced

• quality assurance of data

• formats and standards

• data description and documentation

• ethical, legal issues and IPR resolution

• data back-up procedures, security

• archiving data (for Research Council data archives)

• data management representative

RELU-DSS helps support these areas

Page 30: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

Data management

• Award holders will be required to provide full metadata together with a description of the datasets which their project generates – metadata is the information necessary to interpret,

understand and use a given dataset without reference to the original data collector

• Agree the technical arrangements for data management and archiving (including decisions concerning final archiving destination for project data sets– formats for supply of data– licence agreements; IPR etc.

Page 31: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

RELU awards database

Page 32: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

Quality control and data management issues

• Survey data

• Qualitative data

• Environmental data

Page 33: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

Characteristics of a “good” archived research collection

• Life cycle approach taken

• accurate data, well organised and labelled files

• appropriate measurement of key concepts

• supporting data/documentation should be deposited to a standard that would enable them to be used by a third partycreated– major stages of research recorded – research/measurement instruments documented

• data that can be stored in user-friendly “dissemination” formats, but can also be archived in a future-proof “preservation” format

• consent, confidentiality & copyright resolved

Page 34: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

ESDS: Supporting documentation

• To produce catalogue record and user guide

– funding application– questionnaire/Interview schedules– description of methodology (details of sample design, response

rate, etc)– “codebook”(variable names, variable descriptions, code names

and variable formatting information)– technical report describing the research project.– communication with informants on confidentiality– Coding schemes / themes– End of award report– software description/versions used– bibliographies, resulting publications– code used to create derived variables or check data (e.g. SPSS,

STATA or SAS “command files”)

• Anything that adds insight or aids understanding and secondary usage

Page 35: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

Standardised description (metadata) fields taken from DDI specification for social science datasets

Page 36: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

Survey data - variables

Page 37: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

Labelling of survey data

• all variables should be named. Variable names should not exceed 8 characters where possible, as the most common format for disseminating data is SPSS

• all variables should be labelled. Labels should be brief (preferably < 80 characters), but precise and always make explicit the unit of measurement for continuous (interval) variables. Where possible, all variable labels should reference the question number (and if necessary questionnaire). For example, the variable q11bhexc might have the label “q11b: hours spent taking physical exercise in a typical week”. This gives the unit of measurement and a reference to the question number (q11b), so the user can quickly and easily cross-reference to it

Page 38: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

Labelling of survey data II

• for categorical variables, all codes (values) should be given a brief label (preferably < 60 characters). For example, p1sex (gender of person 1) might have these value labels: 1 = male, 2 = female, -8 = don’t know, -9 = not answered

• where possible, all such labelling should be created and supplied to the UKDA as part of the data file itself. This is the expectation with data supplied in one of the three major statistical packages - SPSS, STATA or SAS.

Page 39: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

QA survey data: validation checks

Computer aided surveys (CAPI, CATI or CAWI)

• these are the most accurate way of gathering survey data, but the software (e.g. Blaise) and hardware (e.g. a laptop for every interviewer) may be beyond project resources 

• computer aided surveys allow one to build in as many logical checks - on question routing and responses - as is possible at the point of data creation

Non computer aided surveys

• less control over initial responses, but checks can performed:– at the point of data entry/transcription if “data entry” software

is used. However, there are few cheap data entry packages around

– enter data without checks directly into a spreadsheet style interface (e.g. Excel worksheet, SPSS data view), and perform validation checks afterwards - via command files in statistical packages or Visual Basic code in Excel or Access

Page 40: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

An example of data seemingly untouched by the human eye:

Originating error in text variables:

Occupation Description of Occupation‘sole trader’ ‘purveyor of seafood’

Propagated error in derived numeric variables:• Respondent was coded under the standard

occupational (SIC) code relating to food retailers:52.2 Retail sale of food, beverages and tobacco in specialised stores

Page 41: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

Identifiers

‘Direct' and 'indirect' identifiers may threaten confidentiality

• Direct identifiers may have been collected as part of the survey administration process and include names, addresses including postcode information, telephone number etc.

• Indirect identifiers are variables which include information that when linked with other publicly available sources, could result in a breach of confidentiality. This could include geographical information, workplace/organisation, education institution or occupation

Page 42: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

Quantitative data

• Remove the identifier from the dataset

• Aggregate/reduce the precision of a variable – record the year of birth rather than the day, month and year;

record postcode sectors (first 3 or 4 digits) rather than full postcode

• Bracket a coded (categorical) variable – aggregated SOC up to 'minor group' codes by removing the

terminal digit

• Generalise the meaning of a nominal (string) variable

• Restrict the upper or lower ranges of a continuous variable

Page 43: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

Online access to data

NESSTAR:

• browse detailed information (metadata) about these data sources, including links to other sources

• do simple data analysis and visualisation on microdata

• bookmark analyses

• download the appropriate subset of data in one of a number of formats (e.g. SPSS, Excel)

• Data ,must be ‘perfect’ - 100% labelled

Page 44: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

Derived and aggregated products

• Permission to share and IPR is main issue

• Range of potential parties with interest:– Owners, funders, data gatherers, employers

other stakeholders, etc.

• All original source information must be recorded

Page 45: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

Transcribing qualitative data

• integrated into the ongoing research – budget accordingly

• full transcriptions or summaries

• costs and benefits;– self transcription– internal team transcription– external transcription

• full transcriptions;– consistent layout– speaker tags– line breaks– header with identifier / other details – checked for errors

Page 46: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

Qualitative data: identifiers removed

• Scheme devised – different for each dataset

• Ideally should reflect any pseudonyms used in publications

• Confidentiality respected

• Anonymisation?

• Problems of anonymisation– Applied too weakly– Applied to strongly– Timing – Potential for distortion

• User undertakings

• Appropriate and sympathetic

Page 47: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

Qualitative Research• e.g set of in-depth interviews

• Data list: list of contents of research collection

• acts as a point of entry for secondary user

• qualitative data: excel template interviewee/case study characteristics

Page 48: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

Page 49: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

Back up and security

• digital, paper and audio media are fragile. Digital media are even easier to change/copy/delete!

• a good backup procedure will protect against a range of mishaps such as: – accidental changes to data– accidental deletion of data – loss of data due to media or software faults– virus infections & hackers– catastrophic events (such as fire or flood)

• control versions

• back up frequently, retain off site copies

• consider storage conditions, fireproofing etc.

Page 50: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

ESDS in-house processing

• in-house data processing

– ‘cleaning up’ research data

– collating documentation received from depositor

– repairing minor errors

– meeting users’ expectations

– cannot engage in major processing tasks unless destined for publishing into online systems

Page 51: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

Environmental Data

Page 52: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

Example: LOCAR Programme

• to better understand the hydrological, physical, chemical and biological processes operating in lowland catchments

• to improve modelling to support the integrated management of lowland catchment systems

• to create a database

– £7.75 Million– Three

catchments– 12 Research

projects– Field

Programme

Page 53: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

• acquire major datasets • provide data to LOCAR Scientists• establish standards for data definition and

exchange• receive data and model output from scientists• publish appropriate data at the end of the

Programme

• ensure long term security and availability of LOCAR data

Objectives of the LOCAR Data Centre

Page 54: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

Datasets from NERC

• River Network• DTM• Land Cover• HOST• Daily Mean Flows• Rainfall• Ground Water Level • Keyworth Borehole Archive Records• Wellmaster Borehole data• Geological maps

Page 55: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

Raingauges

• Automatic Raingauges– 0.2 mm tipping

bucket - hourly• Manual Raingauges

– Checking Automatic gauge

• Rainwater collector – Rainwater chemistry

samples

• Water levels– Deep boreholes

• Flow – EA gauging

stations– Ultrasonic

doppler flow meter

Level and Flow

Page 56: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

Water Quality

• Temperature• Conductivity• Dissolved oxygen• pH• Turbidity• River level• Automatic water

sampler

• Salmon counts• Smolt counts• Redd counts• Fish surveys• River Habitat

Surveys• Plant surveys (Mean

Trophic Rank)• Diatom surveys• Chironomid Exuviae• Macro invertebrate

surveys

Ecology

Page 57: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

Soil Moisture

• Neutron Probe– Soil water content – Radioactive source– Manual

• Profile Probe– 6 shallow depths– Dielectric constant– Automatic

• Tensiometers– Puncture Tensiometers

(Shallow, Manual)– Purgeable Tensiometers

(Shallow, Automatic)– Equitensiometers (Deeper,

Automatic)– Deep jacking tensiometers

(depths up to 60m)

• Soil Water Chemistry– Suction Samplers

Soil Water Potential

Page 58: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

Set up Tasks

• hardware and software requirements• create dictionaries• load site and instrument data• format conversion facilities• methods• QC• meet with 3rd party suppliers• load 3rd party & NERC data• liaise with CSTs and PIs• website

Page 59: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

Operational Tasks

• receive and load: – field data – data from researchers

• maintenance• data dissemination• develop software• meetings with:

– researchers– CSTs– data managers

• attend workshops, seminars and annual science meeting

• report to steering committee

Page 60: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

Access to datasets

• build a metadata database• build a thesaurus of terms• provide a web based search tool• later provide web access to the datasets

Page 61: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

Searching for metadata on the web

• Search:– by keyword– by project– detailed search– by theme

• Description of selected dataset:– Title– Abstract– Contact– Extent

Page 62: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

Ethical and legal issues

Page 63: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

Up front

• issues of consent and confidentiality allowing archiving should be included in the project management plan & addressed before data collection starts

• longer-term rights management in place and IPR issues considered

• unless a waiver on deposition has been agreed, researchers should not make commitments to informants which preclude archiving their data

Page 64: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

Consent for archiving

• anonymity and privacy of research participants should be respected

• explicit ‘informed’ consent gained

• information for research participants should be clear and coherent and include:

– purpose of research – what is involved in participation – benefits and risks – storage and access to data – usage of data (current and future uses)– withdrawal of consent at any time– Data Protection & Copyright Acts

• N.B. Additional measures are needed when participants are unable to consent through incapacity or age

• reflect needs and views of all

• works in practice

Page 65: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

Legal issues in data preparation

• ‘Duty of confidentiality’

• Law of Defamation

• Data Protection Act 1998 and EU Directive

• Copyright Act 1988

• Freedom of Information

Page 66: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

Duty of Confidentiality

• disclosure of information may constitute a breach of confidentiality and possibly a breach of contract

• not governed by an Act of Parliament• not necessarily in writing• can be a legal contractual

• exemptions are:– relevant police investigations or proceedings– disclosure by court order– ‘public interest’ - defined by the courts– ethical obligations in cases of disclosure of child abuse

Page 67: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

Law of Defamation

• a defamatory statement is one which may injure the reputation of another person, company or business

Page 68: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

Data Protection Act 1998

• eight principles:– Fairly and lawfully processed – Processed for limited purposes – Adequate, relevant and not excessive – Accurate – Not kept longer than necessary – Processed in accordance with the data subject's

rights – Secure – Not transferred to countries without adequate

protection

• allows for secondary use of data for research purposes under certain conditions

Page 69: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

Options for preserving confidentiality

• anonymisation

• consent to archive at the time of field work

• researcher contacts informants retrospectively

• user undertakings

• in exceptional circumstances - permission to use or closure of material

Page 70: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

Copyright Act 1988

• developed for the broadcasting industry not research!

• protection of author’s rights

• multiple copyrights apply:– automatically assigned to the speaker– researcher holds the copyright in the sound recording of an

interview obtain written assignment of copyright from interviewee,

or oral agreement (license) to use– employer holds the copyright in research data

obtain copyright clearance from employer)• copyright lasts for 70 years after the end of the year in which the

author dies • copying work is an infringement unless it is for the purposes of

research, private study, criticism or review or reporting current events, and if the use can be regarded as being in the context of 'fair dealing

• seek legal advice on problem issues

Page 71: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

Freedom of Information

• Freedom of Information Act 2000

A statutory right for individuals and organisations to request information held by public authorities.FOI specifically excludes environmental information which is covered by …

• Environmental Information Regulations 2004

• Enables individuals and organisations to obtain environmental information held by public authorities….

Many RELU data sets will fall under the EIRs

Page 72: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

What is the legislation?

• Statutory rights of access to information

• Apply to public authorities – BBSRC, ESRC, NERC and the universities are public authorities

• Any one, anywhere can request copy of any information you hold – includes data sets

• Not all information has to be released

• Must respond to most requests in 20 days

Page 73: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

Exemptions –information protected by law

• Don’t Panic - not all information has to be made available under FoI & EIRs

• FOI & EIRs provide a number of exemptions that can be applied to the release of information

• The presumption is that information will be made available unless for good reason (a public interest test).

• Exemptions protect scientific output, commercial business and personal information (through the Data Protection Act)

• Exemptions can be complex and difficult to apply. If in doubt, ask….

Page 74: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

RELU data management responsibilities

• In supporting RELU award holders, the RELU DSS will:

– provide advice and guidance on project data management through a web site, a help desk, visits and workshops

– officially sign off the projects’ Data Management Plan– provide a web-database of RELU data being collected by

award holders– assist with finding out about accessing third party data

sources– provide advice on assembling metadata and depositing data

with ESRC and NERC Data Centres

• RELU award holders are expected to:

– read and sign up to the Programme's Data Management Policy

– complete the Data Management Plan– consult the DSS website and contact DSS staff if clarification

is needed– be responsive to requests for information from the DSS

Page 75: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

Future visions

Supporting cross-disciplinary research by:

• providing better resource discovery , e.g to cross-search environmental and social science data databases

• from a data resource point of view, providing guidance on ways of integrating existing data by exemplars:– tools– methods– interpretation– visualisation– confidentiality and disclosure– providing more web-enabled data– Encourage e-science applications e.g. ‘grid-enabling’

data

Page 76: RELU CALL PI meeting, 12 October, 2005 RELU Data Support Service RELU-DSS Louise Corti, UK Data Archive

RELU CALL PI meeting, 12 October, 2005

RELU-DSS

• The DSS will provide support to RELU award holders (Call 1 and 2) and Call 3 applicants, through a telephone and email help desk, a web portal and a series of training events.

• relu.esds.ac.uk

• Email: [email protected]

• Tel: 01206 872974