further information on studies - metadata...

32
ISO/IEC JTC1/SC32/WG2 N1952 MRC Data Support Service Standalone Registry Introduction The 'Stand Alone' registry is a distribution of the full ISO11179 compliant MRC DSS Study and Dataset metadata database, packaged for use by individual units for a small number of studies both in internal operations and for the preparation of content sets for the central registry. The package includes a client-side addin for Microsoft Excel that supports the registration of tabular datasets, and functions for the basic registration of SAS and SPSS Data Dictionary Files. The registry The registry software comprises of an eXist XML database with extensions to support XSLT 2.0, XQuery scripts, XSL Transforms and supporting XML configuration files. It allows a unit to register variables, datasets and reference documents, and to associate these items with a general record that describes the intent and status of the study to which they relate, and subject matter classification schemes to help users access this content. The plug-in The plug-in for Microsoft Excel 2007 allows a user to access search, annotation and registration capabilities from within the spreadsheet software. A user can invoke the tool to define a blank spreadsheet from standard variable types already within the registry or create new variable definitions to document the data collection that has been planned. Furthermore, any existing tabular data that may be imported and manipulated in Excel may be annotated by the creation of new variable definitions or in reference to existing ones already defined in the local registry. Further compatible tools for bulk processing of SAS and SPSS data dictionary content, and extensions to the registry for particular local requirements can be made available on request.

Upload: doanmien

Post on 06-Feb-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

ISO/IEC JTC1/SC32/WG2 N1952

MRC Data Support Service Standalone RegistryIntroductionThe 'Stand Alone' registry is a distribution of the full ISO11179 compliant MRC DSS Study and Dataset metadata database, packaged for use by individual units for a small number of studies both in internal operations and for the preparation of content sets for the central registry. The package includes a client-side addin for Microsoft Excel that supports the registration of tabular datasets, and functions for the basic registration of SAS and SPSS Data Dictionary Files.

The registryThe registry software comprises of an eXist XML database with extensions to support XSLT 2.0, XQuery scripts, XSL Transforms and supporting XML configuration files. It allows a unit to register variables, datasets and reference documents, and to associate these items with a general record that describes the intent and status of the study to which they relate, and subject matter classification schemes to help users access this content.

The plug-inThe plug-in for Microsoft Excel 2007 allows a user to access search, annotation and registration capabilities from within the spreadsheet software. A user can invoke the tool to define a blank spreadsheet from standard variable types already within the registry or create new variable definitions to document the data collection that has been planned. Furthermore, any existing tabular data that may be imported and manipulated in Excel may be annotated by the creation of new variable definitions or in reference to existing ones already defined in the local registry.

Further compatible tools for bulk processing of SAS and SPSS data dictionary content, and extensions to the registry for particular local requirements can be made available on request.

InstallationSystem requirements

HardwareThe base requirements for the metadata registry are modest for units with less than 10,000 variables: we recommend a Pentium class 2GHz processor with 4Gb RAM and 10Gb of free hard-disk space as being appropriate for a dedicated installation, and the registry will happily run on larger systems as a virtual machine. For over 10,000 variables, we recommend 8Gb RAM as a minimum and Core or Xeon processors.

The installer requires the use of a monitor and keyboard. A 'headless' install script for Linux servers without a monitor is available on request.

Operating SystemThe metadata registry should run on any UNIX or modern Windows system for which Java 1.6 is available. We have used it without problems with Windows XP, Windows 7, Windows Server 2008, OS X 10.5.X and 10.6.X, and Ubuntu 9.X and 10.X

Since the database runs in a Java virtual machine, we have found Linux or OS X make more efficient use of resources than Windows for medium to large datasets.

For access to the content from a remote machine, ports 8080 and 8443 should be open by default: these can be changed

SoftwareThe installer expects the Java Developer Kit to be installed on the workstation or server (JDK) which may be obtained and installed from http://www.oracle.com/technetwork/java/javase/downloads/index.html

The plug-in is for Excel 2007 on Windows operating systems only and may require the installation of .NET programmability support.

For power- and programmer- users, we recommend the oXygen XML editor from http://www.oxygenxml.com/.

Details of prerequisite installation are given in Annexe 1.

InstallationThe registry component and the addin are available from (web address here) as an izpack packaged installer. The file is in the order of 100Mb depending upon the default datasets packaged within it, so it may take a little while to download on a slow connection. If the JDK is already installed and appropriate environment variables configured (see Appendix 1), the installer can be started simply by double clicking on the icon. However, many windows machines lack the environment variables required to automatically invoke Java, in which case, the command

java -jar omr-setup-X.X.Xdev-revXXXX.jar

will start installation.

ISO/IEC JTC1/SC32/WG2 N1952

A default installation may simply be obtained by accepting the license agreement and clicking 'next' until the final screen is reached.

On Windows 7 systems, we have found that installation into the 'Program Files' directory is problematic. Unless you have particular requirements, we recommend accepting the default installation location.

The wizard will ask you to confirm the installation location before you continue.

The next page will offer options for the installation of source code and datasets. We recommend that you choose the default DSS dataset, which provides all of the basic documents required to curate content for the Data Support Service Directory, and share and compare metadata with other units.

Then the installer will ask you to select and confirm the location of the database data directory on your disk. Again, we recommend selecting the default option of C:\omr\registry\webapp\WEB-INF\data

ISO/IEC JTC1/SC32/WG2 N1952

Next the installer will prompt you for an administrator password. The default is oxfordmetadata and should be fine for all normal, unsecured installations within a firewall. If you are planning on a wider or more secure installation then this should be changed. Remember to keep a copy of the password in a secure location.

The installer will take a little time copying the files to the specified location, generating a SSL certificate and loading the default content set into the database.

Finally on a windows based system it will offer you the options to create shortcuts in various locations.

and then the installation will complete with some instructions on getting the database started. If installation fails, please contact the Oxford Metadata Registry Support team.

ISO/IEC JTC1/SC32/WG2 N1952

Starting the metadata registry

WindowsOn a windows system, the metadata registry can be started by selecting the option in the start menu, or by double clicking on a start menu shortcut, or by navigating to C:\omr\registry\bin and starting startup.bat

A console will open and if startup is successful the following text will appear ----------------------------------------------------------------

eXist-db has started on port 8080 8443. Configured contexts:

http://localhost:8443/exist

http://localhost:8443/

----------------------------------------------------------------

OS X and LinuxOn a macintosh, open a terminal session and navigate to /Applications/omr/registry/bin and type the command

./startup.sh

ISO/IEC JTC1/SC32/WG2 N1952

Once started you can get to the homepage by following the url

http://localhost:8080/exist/mdr/web/homepage.xql

Stopping the metadata registryThe metadata registry can be stopped by selecting the command/terminal window and typing <ctrl><c> then typing <y> in response to the confirmation message.

Setting up eXist as a serviceOn a windows machine, the omr can be set to start as a service when the server boots simply by selecting the 'install eXist as a service' - this will require administrator privileges so you may need to right click on the shortcut and select 'run as administrator'. For Linux and OS X you will need to link exist.sh to an entry in the init.d directory

ln -s $EXIST_HOME/tools/wrapper/bin/exist.sh /etc/init.d/exist

Please refer to your distribution's documentation: if you are not comfortable working with the command line, then you might want to talk to your systems administrator. Further details about starting, stopping and configuring eXist can be obtained from

http://exist.sourceforge.net/documentation.html

An overview of MDR functionalityThe metadata registry is a database for recording, organising and administrating metadata - in the context of population studies, it provides data dictionary capabilities for one or more closely related studies to facilitate data management and data discovery. The metadata registry implements a number of international standards to facilitate the exchange and use of content that is created within it from both the International Standards Organisation (ISO) and the World Wide Web Consortium (W3C). Support for the emerging version 3 of the Data Documentation Initiative (DDI) is under development. Most items in the metadata registry are 'administered' and share common facilities for naming, definition and change management - these facilities implement ISO11179-3 edition 2.

Types of contentThe metadata registry currently recognises three types of primary content: Study definitions; dataset definitions and variable definitions. Study Definitions provide a largely flat, fixed record of aspects of the study including names and acronyms, organisations responsible for participation and organisation, an overview of the funding provided, some description of the cohort and those people who provide the primary point of contact. At the time of writing there was no accepted standard for metadata about a population study, so the record structure is based upon several existing relevant study registries and the overall intent of standards such as CONSORT which aim to provide a specification for a report of cohort studies. The Study metadata items are listed in Appendix 2.

ISO/IEC JTC1/SC32/WG2 N1952

Dataset Definitions allow the recording of data dictionaries for existing datasets: the metadata registry ships with support for the automatic registration of SPSS and SAS metadata records – extension to describe relational data sources or XML files of other formats can be developed on request. Dataset definitions accord to the general principles of the ISO19763 family of standards.

ISO/IEC JTC1/SC32/WG2 N1952

Variable Definitions document variables contained within datasets, or support the declaration of standard, reusable definitions that are to be conserved across the duration of an experiment. Variable definitions implement ISO11179-3.

ISO/IEC JTC1/SC32/WG2 N1952

Searching for contentContent can be located in the metadata registry by type and phrase. The search functionality is accessed directly from the main menu. The phrase may be any lexical string using standard wildcards and logical operators in capital letters. Example strings include: child; child AND carer; child OR carer;child*; child* AND carer.

A query is composed of terms and operators. Terms may be single terms or phrases – a phrase is a number of terms surrounded by quotation marks. A document only matches a phrase if the exact text within the quotation marks is present. Terms and phrases may be concatenated with the Boolean operators AND, OR and NOT. Single and multiple character wildcards may be included in terms: ‘?’ matches any single character; ‘*’ matches zero or more characters. Both may be used within or at the end of a term: te?t; test*; and te*t are all valid. However, wildcards cannot be used at the beginning of a term or within a quoted phrase. Thus neither ?est nor “smok* habit” are supported. Special characters can be escaped with the backslash ‘\’ character: today\? will search for the word ‘today’ followed by a question mark, rather than ‘today’ followed by any character.

Where more than five matches are returned for any search, the results are paged: navigation through the pages can be achieved through the ‘start’, ‘previous’, ‘next’ and ‘last’ links at the bottom of the page.

ISO/IEC JTC1/SC32/WG2 N1952

Browsing content alphabeticallyFor smaller metadata sets, and for ones where dataset and variable names are controlled and structured, content can be located alphabetically. A strip of buttons facilitates alphabetic access: click on a button to filter the content by that initial letter. Where there are more than five items sharing the same initial letter, paging links will become active at the bottom of the page as with the

search web pages: navigation through the set of items can be achieved by clicking on the appropriate link.

ISO/IEC JTC1/SC32/WG2 N1952

Browsing content by classification schemeA powerful way of accessing content is through classification schemes. A classification scheme is a taxonomy or hierarchy of concepts or terms that may be associated with variables and datasets. The hierarchy can be navigated and used to filter content according to the associations. Supported classification schemes are a subset of ISO11179 conforming to the W3C Simple Knowledge Organisation System (SKOS) – however the user interface makes the common assumption that broader and narrower relationships are transitive so that a taxonomy may be displayed this may result in incompatibilities with some complex third party concept schemes.

To navigate through a hierarchy, select a scheme from the ‘schemes’ drop down list and then click on the first selected term to bring it into focus. The list of variables on the left hand side has now been filtered according to selected term so that all variables associated with that item, and any items associated with terms that are narrower in meaning to the selected term are displayed. Thus selecting the top concept in a scheme will show all of the variables classified within that scheme.

You can bring any other visible term listed as broader, narrower or related into focus simply by clicking on it. Items are returned in alphabetic order and where multiple pages of items are found, navigation through the pages can be accomplished using the links at the bottom of the page as with the search and alphabetic list web pages. Further restriction on the result set may be achieved by entering search terms or phrases into the ‘search within classification’ text box followed by the return key. Clicking on the ‘reset form’ link will reset the whole form and return to the default state with no classification scheme selected and no filter or paging restriction applied.

Browsing content within excel

ISO/IEC JTC1/SC32/WG2 N1952

Getting started with the OMRLogin

Creating a registration authority

Creating an organisation

Creating a context

Creating a study record

Registering a dataset through the web interface

Registering an excel dataset

Creating a variable

Administration of the OMRManaging users

Backup and Restore

Indexing

Securing content

ISO/IEC JTC1/SC32/WG2 N1952

Appendix 1 Prerequisite Software InstallationJDK installation on WindowsThe OMR requires Sun’s Java Developer Kit (JDK) 6. You can download the latest version at

http://www.oracle.com/technetwork/java/javase/downloads/index.html

Once you have downloaded the installer, double-click on it to install. You should make a note of the JDK installation path, which for windows will typically be something like...

C:\Program Files\Java\jdk1.6.0_20

...for build 20 of version 1.6.0. You might note the confusion over version numbering.

Set the JAVA_HOME Environmental VariableHaving installed the JDK, you should set the JAVA_HOME system variable to point to the JDK directory you made a note of during installation. If you navigate to the directory through windows explorer, you can copy the path from the address bar.

To set the environment variable:

1. Right-click on My Computer and select Properties from the shortcut menu.

2. In the System Properties dialog box, click the Advanced tab.3. On the Advanced tab, click the Environment Variables button.4. In the System Variables list at the bottom of the Environment

Variables dialog box, look for a JAVA_HOME environment variable in the list.

5. If a JAVA_HOME variable exists, check to see if it matches the JDK installation directory you noted above: a. If it does not match, click Edit and type/paste the JDK directory

path into the Variable value field and click OK twice to accept the changes.

b. If it matches, click Cancel twice to dismiss the dialog boxes.6. If a JAVA_HOME variable does not appear in the list, do the

following:c. Click New.d. In the Variable name field, enter JAVA_HOME.e. In the Variable value field, type or paste the full path to the

JDK directory.f. Click OK.g. Click OK to close the Environment Variables dialog box.

ISO/IEC JTC1/SC32/WG2 N1952

h. Click OK to exit the System Properties dialog box.

The JAVA_HOME variable is now set and the OMR installation will be able to find the JDK during installation.

.NET Programmability Support installation

.NET Programmability Support is required to install the Excel addin. To enable.NET Programmability:

7. Open the Windows Control Panel and select Add/Remove Programs (XP) or Programs and Features (Win 7).

8. Find and click on Microsoft Office Professional 20079. From the list of programs, select Microsoft Office

Professional/Enterprise/Ultimate 2007.10.Click Change.11.Select Add or Remove Features and click Continue.12.Click the plus sign (+) to the left of Microsoft Office Excel to show

the available options.13.Make sure that .NET Programmability Support appears in the list

and is available. This is indicated by the box being shown in white instead of gray or with a red X through the disk icon within it.

14.If it is not enabled, click the drop-down arrow in the box located to the left of the .NET Programmability Support option and select Run from My Computer. It may prompt you for your MS Office CD to complete this task.

Once .NET Programmability Support is verified or enabled, you will be able to install the Excel Query Service Addin.

ISO/IEC JTC1/SC32/WG2 N1952

Appendix 2: Study metadata recordThe structure of study information can be described in an object model: Error: Reference source not found shows an overview of the latest implementation. A Study object is a kind of Administered Item – one whose edits are tracked and recorded over time – and has basic information such as a name, alternative and previous names, and a set of typed identifiers. A Study has a collection of Links, Identifiers and Support details, and a set of Resources which themselves are Administered Items.

A Study has a number of role-based relationships with people, or Contacts – those designated as investigators, researchers, administrators, and initial points for communication, for example – represented by the SC_Role association class. Similar role-based relationships exist between studies and organisations (SO_Role), and between contacts and organisations (CO_Role).

Table 1 lists the primary attributes of the Study class: all have ‘public’ visibility, and would be displayed in the full view of a study record. The type of each field is given; those denoted as being of type String may be subject to further evolution – in particular, introducing controlled vocabularies once a candidate set

has been proposed. The cardinality of each attribute is also given: in this case, we use the standard UML syntax, where “1” denotes a mandatory field, “0..1” denotes an optional field, “0..*” represents a possibly empty set of values, and “1..*” represents a non-empty set of values.

Field Name Type Cardinality

Description

Name String 1 The preferred name of the study, chosen by the study owner

Other names String 0..* A set of alternative and previous names, including acronyms

Organisations

Organisation

1..* The main organisations involved: those coordinating, participating, and those to contact initially for more details

Accountable People

Contact 0..* Including Principal Investigators, Directors and Leads of projects

Support Support 0..* The credited sources of support for the study, with links to further details

Approvals Approval 0..* A list of existing external approvals, other than local ethics

Research Areas

String 1 The main clinical or social areas studied

Description String 1 A brief textual description of the study purpose, proposal or activity

Population String 1 A description of study population, including details of gender, ethnicity, age range, etc. where appropriate

Data Collected

String 1 A broad categorisation of the data collected

Data Sources

Resource 0..* Copies of, or links to, questionnaires, interviews, existing data resources. Individual resources may have restricted access.

Sample Size String 1 An indication of the initial recruitment or cohort size

Status String 1 Whether the study is in preparation, is collecting data, or has completed

ISO/IEC JTC1/SC32/WG2 N1952

Field Name Type Cardinality

Description

Recruitment String 1 Whether the study is currently recruiting participants, and details

Geography String 1 A broad description of the geographical areas involved in the study

Start Date Date 1 The date on which the main study formally began, or is due to begin

Completion Date

Date 0..1 The actual, or planned, completion date for cohort management

Links Link 0..* Clickable URLs to additional information about the study

Data Access String 1 A description of the data sharing policy for the study

Additional Information

String 1 Any important extra information which doesn’t fit in the existing fields

Contacts Contact 0..* Methods of contact, where applicable

Identifiers Identifier 0..* The MRC study identifier, along with any other unique identifiers for the study

Last Updated

Date 1 The date of the last update of this Study record

Table 1: Primary attributes of the Study class

Further information on studiesThe fields shown in Table 2 will be maintained at a lower level of access and some fields may not be publicly available.

Field Name Type Cardinality

Description

Related Parties

String 1 Categories of data recorded about related parties

Sampling Method

String 1 The sampling method used in data collection

Participation Type

String 1 An indication of whether participants opted in or out

Other Contact 0..* A list of other investigators, and those who

Field Name Type Cardinality

Description

Investigators have added value to the study

Abstract String 1 Further information about the study, its background, and its evolution

Other Data Sources

Resource

0..* Additional, more specific resources; individual resources may have restricted access.

Inclusion Criteria

String 1 Detailed criteria for inclusion

Exclusion Criteria

String 1 Detailed criteria for exclusion

Follow Up String 1 The frequency and mechanisms employed for follow-up

Current Size String 1 A brief narrative of the current cohort size

Research Purposes

String 1 The approved research purposes for data from the study

Approvals Required

String 1 External approvals that may be required to use the data

Funding Required

String 1 Details of financial support required for data access/sharing

Keywords Keyword

0..* A set of keywords to assist in locating the study within the directory

Table 2: Secondary attributes of the Study class