sharing clinical research data: an iom workshop

185
Forum on Drug Discovery, Development, and Translation Forum on Neuroscience and Nervous System Disorders Roundtable on Translating Genomic-Based Research for Health National Cancer Policy Forum Present Sharing Clinical Research Data: An IOM Workshop October 4-5, 2012 National Academy of Sciences 2101 Constitution Avenue, N.W. Washington, D.C. 20418

Upload: marilyn-mann

Post on 06-May-2015

10.281 views

Category:

Health & Medicine


0 download

DESCRIPTION

Briefing book for IOM workshop on Sharing Clinical Research Data, Oct. 4-5, 2012

TRANSCRIPT

Page 1: Sharing Clinical Research Data:  an IOM Workshop

Forum on Drug Discovery, Development, and Translation

Forum on Neuroscience and Nervous System Disorders

Roundtable on Translating Genomic-Based Research for Health

National Cancer Policy Forum

Present

Sharing Clinical

Research Data:

An IOM Workshop

October 4-5, 2012

National Academy of Sciences 2101 Constitution Avenue, N.W.

Washington, D.C. 20418

Page 2: Sharing Clinical Research Data:  an IOM Workshop

Sharing Clinical Research Data:

An Institute of Medicine Workshop

TABLE OF CONTENTS

October 4 - 5, 2012

Washington, DC

ITEMS

TAB/PAGE

Agenda AGENDA

Sharing Clinical Research Data 1

COORDINATING ROUNDTABLE AND FORUMS

Coordinating Roundtable and Forums

COORDINATING

ROUNDTABLE

AND FORUMS

• Forum on Drug Discovery, Development, and Translation 1

• Forum on Neuroscience and Nervous System Disorders 3

• Roundtable on Translating Genomic-Based Research for Health 5

• National Cancer Policy Forum 7

WORKSHOP

Workshop, October 4-5 WORKSHOP

• Speaker Biographies 1

BACKGROUND ARTICLES

Background Reading for Workshop1

BACKGROUND

ARTICLES

Benefits

• Akil, H., M. E. Martone, et al. Challenges and opportunities in

mining neuroscience data

1

• Brown, J. S., J. H. Holmes, et al. A practical and preferred approach to

multi-institutional evaluations of comparative effectiveness, safety, and

quality of care

6

• Krumholz, H. M. Open science and data sharing in clinical research :

Basing informed decisions on the totality of the evidence

13

Models

• Califf, R. M., D. A. Zarin, et al. Characteristics of clinical trials registered in

ClinicalTrials.gov, 2007-2010 15

• Doshi, P., T. Jefferson, et al. The imperative to share clinical study reports:

Recommendations from the Tamiflu experience 25

• Piwowar, H.A. Who shares? Who doesn’t? Factors associated with

openly archiving raw research data 31

• Turner, E. H., A. M. Matthews, et. al. Selective publication of

antidepressant trials and its influence on apparent efficacy

44

• Wagner, J. A., M. Prince, et al. The Biomarkers Consortium:

Practice and pitfalls of open-source precompetitive collaboration

53

• List of Clinical Research Data Sharing Projects

57

1 Background materials have been gathered throughout the workshop planning committee process to inform the

development of the meeting and understanding of key issues. The articles contained in this briefing book are a

subset of background materials collected and inclusion of particular articles does not denote IOM endorsement.

Page 3: Sharing Clinical Research Data:  an IOM Workshop

ITEMS

TAB/PAGE

Standardization and Governance

• CDISC press release. CDISC, C-Path and FDA collaborate to develop data

standards to streamline path to new therapies

75

• CDISC press release. Public release of Alzheimer’s clinical trial data by

pharmaceutical researchers 77

• Davies, K. Get smart: Knowledge management 80

• Davies, K. Running tranSMART for the drug development marathon 82

• The Economist editorial. Genomic research: Consent 2.0 85

• Ghersi, D., et al. Reporting the findings of clinical trials: a discussion paper 86

• Hayden, E. C. Informed consent: A broken contract 88

• Kolata, G. Genes now tell doctors secrets they can’t utter 91

• Savage, C. J. and A. J. Vickers. Empirical study of data sharing by authors

publishing in PLoS Journals 96

Culture and Policy

• Eichler, H.-G., E. Abadie, et al. Open clinical trial data for all? A view from

regulators 99

• Laine, C., S. N. Goodman, et al. Reproducible research: Moving toward

research the public can really trust 101

• Law, M. R., Y. Kawasumi, et al. Despite Law, Fewer Than One In Eight Completed

Studies Of Drugs And Biologics Are Reported On Time On ClinicalTrials.gov 106

• Mansi, B. A., J. Clark, et al. Ten recommendations for closing the credibility

gap in reporting industry-sponsored clinical research: A joint journal and

pharmaceutical industry perspective 114

• NIH. Final NIH statement on sharing research data

• Vickers, A. J. Whose data set is it anyway? Sharing raw data from

randomized trials 120

• Zarin, D. A. and T. Tse. Moving toward transparency of clinical trials 126

TRAVEL INFORMATION

Travel Information

TRAVEL

INFORMATION

• Logistics Memo 1

• Map of National Academies Building 5

Page 4: Sharing Clinical Research Data:  an IOM Workshop

AGENDA

Page 5: Sharing Clinical Research Data:  an IOM Workshop

Forum on Drug Discovery, Development, and Translation Forum on Neuroscience and Nervous System Disorders

National Cancer Policy Forum Roundtable on Translating Genomic-Based Research for Health

1

Sharing Clinical Research Data: An Institute of Medicine Workshop

DRAFT AGENDA

October 4 & 5, 2012

National Academy of Sciences Building, Room 125

2101 Constitution Avenue, N.W.

Washington, DC

Background:

Pharmaceutical companies, academic institutions, advocacy organizations, and government

agencies such as FDA and NIH have large quantities of clinical research data. Increased data

sharing could facilitate scientific and public health advances, among other potential benefits to

patients and society. Much of this information, however, is never published or shared. More

specifically, study results are not always published and where results are published, they

typically only include summary-level data; participant-level data is privately held and rarely

shared or revealed publicly.

The workshop will explore the benefits of and barriers to the sharing of clinical research data

and will help identify strategies for enhancing sharing both within sectors and across sectors. To

make the workshop scope manageable, the workshop will focus on data resulting from

preplanned interventional studies of human subjects. While recognizing the importance of other

data sources such as observational studies and electronic health records, this focus was selected

to encourage concrete problem-solving discussions over the course of a day-and-a-half meeting.

Models and projects that involve sharing other types of data will be considered during the

workshop to the extent that these models provide lessons and best practices applicable to sharing

preplanned interventional clinical research data.

The workshop is being jointly organized by the Institute of Medicine’s Forum on Drug

Discovery, Development, and Translation; Forum on Neuroscience and Nervous System

Disorders; National Cancer Policy Forum; and Roundtable on Translating Genomic-Based

Research for Health. Participants will be invited from industry, academia, government agencies

such as FDA and NIH, disease advocacy groups, and other stakeholder groups.

Meeting Objectives:

Examine the benefits of sharing of clinical research data, and specifically clinical trial

data, from all sectors and among these sectors, including, for example:

o Benefits to the research and development enterprise

o Benefits to the analysis of safety and efficacy

Identify barriers and challenges to sharing clinical research data

Explore strategies to address these barriers and challenges, including identifying priority

actions and “low-hanging fruit” opportunities

Discuss strategies for using these potentially large data sets to facilitate scientific and

public health advances.

Page 6: Sharing Clinical Research Data:  an IOM Workshop

Forum on Drug Discovery, Development, and Translation Forum on Neuroscience and Nervous System Disorders

National Cancer Policy Forum Roundtable on Translating Genomic-Based Research for Health

2

Day One

8:30 a.m. Opening Remarks

SHARON TERRY, Workshop Chair

President and Chief Executive Officer

Genetic Alliance

SESSION I: BENEFITS OF SHARING CLINICAL RESEARCH DATA

Session Objectives:

Provide an overview of the benefits of sharing clinical research data, specifically clinical

trial data, and discuss advantages and disadvantages of sharing participant vs. summary

level data from individual trials as well as pooling data across multiple studies.

Outline examples of scientific success stories that illustrate what can be accomplished

when clinical trial data is shared.

8:40 a.m. Background and Session Objectives

WILLIAM POTTER, Session Co-Chair

Co-Chair Emeritus

Neuroscience Steering Committee

FNIH Biomarkers Consortium

DEBORAH ZARIN, Session Co-Chair

Director, ClinicalTrials.gov

National Library of Medicine

National Institutes of Health

8:50 a.m. Fundamentals and Benefits of Sharing Participant-Level Clinical Trial Data

ELIZABETH LODER

Clinical Epidemiology Editor

British Medical Journal

9:10 a.m. Pooling Data from Multiple Clinical Trials to Answer Big Questions

ROBERT CALIFF

Director, Duke Translational Medicine Institute

Professor of Medicine

Vice Chancellor for Clinical and Translational Research

Duke University Medical Center

Page 7: Sharing Clinical Research Data:  an IOM Workshop

Forum on Drug Discovery, Development, and Translation Forum on Neuroscience and Nervous System Disorders

National Cancer Policy Forum Roundtable on Translating Genomic-Based Research for Health

3

9:30 a.m. Panel Discussion: Perspectives on the Benefits of Sharing

Clinical Trial Data

Data sharing – what does it mean from your perspective?

Considering the benefits and risks of sharing clinical research data, how

extensively should it be shared to maximize new knowledge and

ultimately patient benefit?

Panelists:

HARLAN KRUMHOLZ

Harold H. Hines, Jr. Professor of Medicine and Epidemiology and Public Health

Yale University School of Medicine

MYLES AXTON

Editor

Nature Genetics

JESSE BERLIN

Vice President of Epidemiology

Janssen Research and Development

10:30 a.m. BREAK

SESSION II: DATA SHARING MODELS: DESIGN, BEST PRACTICES, AND

LESSONS LEARNED

Session Objectives:

Present examples, best practices, and lessons learned from projects across the continuum

of data sharing opportunities (e.g., rapid publication of participant-level data, increased

access to participant-level data for qualified researchers, or maximizing the use of clinical

research data that is currently held in centralized locations by requiring sharing or access

to subsets of data).

Distill best practices and lessons learned that can be applied broadly to new projects to

maximize the use of data from individual trials and/or data pooling initiatives.

10:45 a.m. Background and Session Objectives

JEFFREY NYE, Session Chair

Vice President and Head

Neuroscience External Innovation

Johnson and Johnson Pharmaceutical R&D, LLC

10:55 a.m. The Limits of Summary Data Reporting: Lessons from ClinicalTrials.gov

Page 8: Sharing Clinical Research Data:  an IOM Workshop

Forum on Drug Discovery, Development, and Translation Forum on Neuroscience and Nervous System Disorders

National Cancer Policy Forum Roundtable on Translating Genomic-Based Research for Health

4

DEBORAH ZARIN

Director, ClinicalTrials.gov

National Institutes of Health

11:10 a.m. Models that Increase Access and Use of Data from Individual Clinical Trials

The DataSphere Project

CHARLES HUGH-JONES

Vice President, Medical Affairs North America

Sanofi Oncology

Yale/Medtronic Experience

RICHARD KUNTZ

Senior Vice President

Chief Scientific, Clinical and Regulatory Officer

Medtronic, Inc.

11:40 a.m. Models that Foster Pooling and Analysis of Data

FNIH Biomarkers Consortium Adiponectin Project

JOHN WAGNER

Vice President, Clinical Pharmacology

Merck & Co., Inc.

Novel Methods Leading to New Medications in Depression and

Schizophrenia (NEWMEDS) Consortium

JONATHAN RABINOWITZ

Academic Lead, NEWMEDS

Bar Ilan University

12:10 p.m. Series of Brief Presentations on Overcoming Challenges Facing Clinical Trial

Data Sharing

Challenge #1: Permissions

JENNIFER GEETTER

Partner

McDermott Will & Emery

Challenge #2: Techniques and Methodologies

Page 9: Sharing Clinical Research Data:  an IOM Workshop

Forum on Drug Discovery, Development, and Translation Forum on Neuroscience and Nervous System Disorders

National Cancer Policy Forum Roundtable on Translating Genomic-Based Research for Health

5

JOHN IOANNIDIS [via video conference]

C.F. Rehnborg Chair in Disease Prevention

Stanford University

Challenge #3: Culture

KELLY EDWARDS

Acting Associate Dean, The Graduate School

Associate Professor, Bioethics and Humanities

University of Washington

12:40 p.m. Discussion among speakers, panelists, and audience

Discussant:

Sally Okun, Health Data Integrity & Patient Safety, PatientsLikeMe

1:00 p.m. LUNCH

KEYNOTE CASE STUDY: DISTRIBUTED SYSTEMS FOR CLINICAL RESEARCH

INFORMATION SHARING

1:30 p.m. RICHARD PLATT

Professor and Chair

Department of Population Medicine

Harvard Pilgrim Health Care Institute and Harvard Medical School

1:50 p.m. Discussion with speaker and audience

Moderator:

Sharon Terry, Genetic Alliance

SESSION III: STANDARDIZATION AND GOVERNANCE

Session Objectives:

Receive an update on recent legislative and regulatory language regarding standardization

of clinical research data and discuss how stakeholders are designing and implementing

data standardization plans in response.

Discuss the relative cost-benefit of data conversion of existing trial data versus building

an infrastructure to improve data collection and sharing moving forward.

Present case studies from data sharing projects using different data standardization and

governance models and consider lessons learned or best practices for the future.

2:00 p.m. Background and Session Objectives

Page 10: Sharing Clinical Research Data:  an IOM Workshop

Forum on Drug Discovery, Development, and Translation Forum on Neuroscience and Nervous System Disorders

National Cancer Policy Forum Roundtable on Translating Genomic-Based Research for Health

6

FRANK ROCKHOLD, Session Co-Chair

Senior Vice President, Global Clinical Safety and Pharmacovigilance

GlaxoSmithKline Pharmaceuticals Research and Development

LYNN HUDSON, Session Co-Chair

Chief Science Officer and Executive Director

Coalition Against Major Diseases

Critical Path Institute

2:10 p.m. PDUFA Update on Data Standards

MARY ANN SLACK

Office of Planning and Informatics

Center for Drug Evaluation and Research

U.S. Food and Drug Administration

2:25 p.m. Standardization to Facilitate Data Sharing: Opportunities and Limitations

CDISC Efforts to Support Clinical Research Data

REBECCA KUSH

President and Chief Executive Officer

Clinical Data Interchange Standards Consortium

HL7 Efforts to Support Clinical Care Data

CHARLES JAFFE

Chief Executive Officer

Health Level 7 International

Health Information Technology Perspective on Clinical Research Data

Standards

SACHIN JAIN

Chief Medical Information and Innovation Officer

Merck & Co., Inc.

3:10 p.m. Discussion with speakers and audience

3:30 p.m. BREAK

3:45 p.m. Cost-Benefit Analysis of Retrospective vs. Prospective Data Standardization

Page 11: Sharing Clinical Research Data:  an IOM Workshop

Forum on Drug Discovery, Development, and Translation Forum on Neuroscience and Nervous System Disorders

National Cancer Policy Forum Roundtable on Translating Genomic-Based Research for Health

7

VICKI SEYFERT-MARGOLIS

Senior Advisor, Science Innovation and Policy

Office of the Chief Scientist

U.S. Food and Drug Administration

4:00 p.m. Case Studies: Standardization and Governance Models in Data Sharing

Critical Path Institute and Coalition Against Major Diseases (CAMD)

Alzheimer’s Clinical Trial Database

CAROLYN COMPTON

President and CEO

Critical Path Institute

Translational Medicine Mart (tranSMART)

ERIC PERAKSLIS

Chief Information Officer and Chief Scientist, Informatics

U.S. Food and Drug Administration

4:30 p.m. Panel Discussion

Catalog new data sharing challenges not yet discussed and provide

suggestions for overcoming these challenges.

Given the data standardization and governance models discussed, suggest a

framework to guide the development of new data sharing projects based on

their purpose (e.g., regulatory approval with FDA, detecting safety signals,

testing secondary hypotheses, etc.)

Panelists:

LAURA LYMAN RODRIGUEZ

Director

Office of Policy, Communications and Education

National Human Genome Research Institute

MEREDITH NAHM

Associate Director for Clinical Research Informatics

Duke Translational Medicine Institute

NEIL DE CRESCENZO

Senior Vice President and General Manager

Oracle Health Sciences

Page 12: Sharing Clinical Research Data:  an IOM Workshop

Forum on Drug Discovery, Development, and Translation Forum on Neuroscience and Nervous System Disorders

National Cancer Policy Forum Roundtable on Translating Genomic-Based Research for Health

8

MICHAEL CANTOR

Senior Director

Clinical Informatics and Innovation

Pfizer, Inc.

5:30 p.m. Adjourn day 1

Page 13: Sharing Clinical Research Data:  an IOM Workshop

Forum on Drug Discovery, Development, and Translation Forum on Neuroscience and Nervous System Disorders

National Cancer Policy Forum Roundtable on Translating Genomic-Based Research for Health

9

Day Two

8:00 a.m. Opening Remarks

SHARON TERRY, Workshop Chair

President and Chief Executive Officer

Genetic Alliance

SESSION IV: INCENTIVIZING POLICY AND CULTURAL SHIFTS TO ENHANCE

DATA SHARING

Session Objectives:

Receive an update on clinical trial data transparency decisions in Europe.

Explore current incentives for and against (i.e., benefits and risks of) data sharing within

and across sectors and suggest mechanisms to encourage stakeholders to engage in a

culture of data sharing.

Identify existing and potential strategies, including technology-based approaches, for

protecting patient privacy and confidentiality while facilitating data sharing.

8:10 a.m. Background and Session Objectives

ROBERT HARRINGTON, Session Chair

Arthur L. Bloomfield Professor of Medicine

Chair, Department of Medicine

Stanford University

8:20 a.m. Clinical Trial Data Transparency: European Medicines Agency Perspective

HANS-GEORG EICHLER

Senior Medical Officer

European Medicines Agency

8:40 a.m. Clinical Research Data Sharing Practices and Attitudes

ANDREW VICKERS

Attending Research Methodologist

Department of Epidemiology and Biostatistics

Memorial Sloan-Kettering Cancer Center

Page 14: Sharing Clinical Research Data:  an IOM Workshop

Forum on Drug Discovery, Development, and Translation Forum on Neuroscience and Nervous System Disorders

National Cancer Policy Forum Roundtable on Translating Genomic-Based Research for Health

10

8:55 a.m. Overview of Data Sharing Policies: Research Funders and Publishers

STEVEN GOODMAN

Associate Dean for Clinical and Translational Research

Professor of Medicine & Health Research and Policy

Stanford University School of Medicine

9:10 a.m. Series of Presentations: Incentives for Data Sharing Within and Across Sectors

Academic Perspectives

PETER DOSHI

Post-Doctoral Fellow

Johns Hopkins University School of Medicine

BETH KOZEL

Instructor of Pediatrics

Division of Genetics and Genomic Medicine

St. Louis Children’s Hospital and Washington University

Pharmaceutical Company Perspective

[Speaker TBA]

Federal Research Funder Perspective

JOSEPHINE BRIGGS

Director, National Center for Complementary and Alternative Medicine

Director, National Center for Advancing Translation Sciences, Division of

Clinical Innovation

National Institutes of Health

10:10 a.m. Discussion with speakers and audience

10:30 a.m. BREAK

10:45 a.m. Facilitating Patient Ownership of Clinical Trial Data: Technical Challenges

and Opportunities

JOHN WILBANKS

Director

Sage Bionetworks

Page 15: Sharing Clinical Research Data:  an IOM Workshop

Forum on Drug Discovery, Development, and Translation Forum on Neuroscience and Nervous System Disorders

National Cancer Policy Forum Roundtable on Translating Genomic-Based Research for Health

11

DEVEN MCGRAW

Director, Health Privacy Project

Center for Democracy and Technology

11:15 a.m. Discussion with speakers and audience

SESSION V: NEXT STEPS AND FUTURE DIRECTIONS

Session Objectives:

Discuss key themes from the workshop.

Based on workshop presentations and discussions, identify potential next steps and

priority actions for data sharing stakeholders to take action.

Highlight potential opportunities and challenges that are currently on the horizon but may

become more salient as technology evolves and/or data sharing becomes more pervasive.

11:30 a.m. Background and Session Objectives

SHARON TERRY, Workshop Chair

President and Chief Executive Officer

Genetic Alliance

11:40 a.m. Session Chair Reports [5 minutes per Session]

WILLIAM POTTER, Session I Co-Chair

Co-Chair Emeritus

Neuroscience Steering Committee

FNIH Biomarkers Consortium

DEBORAH ZARIN, Session I Co-Chair

Director, ClinicalTrials.gov

National Library of Medicine

National Institutes of Health

JEFFREY NYE, Session II Chair

Vice President and Head

Neuroscience External Innovation

Johnson and Johnson Pharmaceutical R&D, LLC

FRANK ROCKHOLD, Session III Co-Chair

Senior Vice President

Global Clinical Safety and Pharmacovigilance

GlaxoSmithKline Pharmaceuticals Research and Development

Page 16: Sharing Clinical Research Data:  an IOM Workshop

Forum on Drug Discovery, Development, and Translation Forum on Neuroscience and Nervous System Disorders

National Cancer Policy Forum Roundtable on Translating Genomic-Based Research for Health

12

LYNN HUDSON, Session III Co-Chair

Chief Science Officer and Executive Director

Coalition Against Major Diseases

Critical Path Institute

ROBERT HARRINGTON, Session IV Chair

Arthur L. Bloomfield Professor of Medicine

Chair, Department of Medicine

Stanford University

12:00 p.m. Closing Discussion with Session Chairs, Panelists, and Audience Led by

Workshop Chair

JOSEPHINE BRIGGS

Director, National Center for Complementary and Alternative Medicine

Director, National Center for Advancing Translation Sciences, Division of

Clinical Innovation

National Institutes of Health

MICHAEL ROSENBLATT

Executive Vice President and Chief Medical Officer

Merck & Co., Inc.

JAY “MARTY” TENENBAUM

Founder and Chairman

Cancer Commons

JANET WOODCOCK

Director, Center for Drug Evaluation and Research

U.S. Food and Drug Administration

12:45 p.m. Adjourn

Page 17: Sharing Clinical Research Data:  an IOM Workshop

Forum on Drug Discovery, Development, and Translation Forum on Neuroscience and Nervous System Disorders

National Cancer Policy Forum Roundtable on Translating Genomic-Based Research for Health

The IOM was chartered in 1970 as a component of the National Academy of Sciences to enlist distinguished members of the appropriate

professions in the examination of policy matters pertaining to the health of the public. In this, the Institute acts under both the Academy’s

1863 congressional charter responsibility to be an adviser to the federal government and its own initiative in identifying issues of medical

care, research, and education. For additional information about this workshop, contact Rebecca English at [email protected].

SHARING CLINICAL RESEARCH DATA: A WORKSHOP

October 4-5, 2012 – Washington, D.C.

Background Information

Pharmaceutical companies, academics, and govern-

ment agencies such as the Food and Drug Administra-

tion and the National Institutes of Health have large

quantities of clinical research data. Data sharing within

each sector and across sectors could facilitate scientific

and public health advances and could enhance analysis

of safety and efficacy. Much of this information, how-

ever, is never published. This workshop will explore

barriers to sharing of clinical research data, specifical-

ly clinical trial data, and strategies for enhancing shar-

ing within sectors and among sectors to facilitate

research and development of effective, safe, and need-

ed products.

A number of efforts are currently underway to enhance

public/private partnerships and foster collaboration

within the pharmaceutical sector to advance research

and enhance the discovery and development of drugs,

devices, and diagnostic tools. Examples include the

Analgesic Clinical Trials Innovation, Opportunities,

and Networks (ACTION) Initiative; the Foundation

for the National Institute of Health’s Biomarkers Con-

sortium; the Innovative Medicines Initiative (Europe);

Critical Path Institute; Sage Bionetworks’ Clinical Tri-

al Comparator Arm Partnership; and the Life Sciences

Consortium’s MetaPharm project. Planned within the

context of these various efforts, this workshop will

focus specifically on data sharing, which is one com-

ponent of larger efforts to build partnerships and en-

hance collaboration within and among sectors on

research, development, and assessment of pharmaceu-

tical products.

This project will be a coordinated effort of the IOM's

Forum on Drug Discovery, Development and Transla-

tion; Forum on Neuroscience and Nervous System

Disorders; National Cancer Policy Forum; and

Roundtable on Translating Genomic-Based Research

for Health. All four Forums/Roundtables have done

previous work on this topic and this issue cuts across

the focus of each activity.

Statement of Task

The Institute of Medicine will conduct a public work-

shop that will focus on strategies to facilitate sharing

of clinical research data. Participants will be invited

from industry, academia, government agencies such as

FDA and NIH, disease advocacy groups, and other

stakeholder groups. The workshop will feature invited

presentations and discussions that will:

Examine the benefits of sharing of clinical re-

search data from all sectors and among these sec-

tors, including, for example:

o Benefits to the research and development en-

terprise

o Benefits to the analysis of safety and efficacy

Identify barriers and challenges to sharing clinical

research data

Explore strategies to address these barriers and

challenges, including identifying priority actions

and “low-hanging fruit” opportunities

Discuss strategies for using these potentially large

data sets to facilitate scientific and public health

advances.

Planning Committee

Sharon Terry, chair, Genetic Alliance

Josephine P. Briggs, National Institutes of Health

Timothy Coetzee, National Multiple Sclerosis Society

Steven Goodman, Stanford University

Robert A. Harrington, Stanford University

Lynn Hudson, Critical Path Institute

Charles Hugh-Jones, Sanofi Oncology

Jan Johannessen, Food and Drug Administration

Jeffrey S. Nye, Johnson and Johnson Richard Platt, Harvard Medical School

William Z. Potter, FNIH Biomarkers Consortium

Frank W. Rockhold, GlaxoSmithKline

Michael Rosenblatt, Merck

Vicki Seyfert-Margolis, Food and Drug Administration

Deborah A. Zarin, National Institutes of Health

Page 18: Sharing Clinical Research Data:  an IOM Workshop

COORDINATING

ROUNDTABLES and

FORUMS

Page 19: Sharing Clinical Research Data:  an IOM Workshop

Board on Health Sciences Policy Forum on Drug Discovery, Development, and Translation

FORUM ON DRUG DISCOVERY, DEVELOPMENT, AND TRANSLATION

The Institute of Medicine’s Forum on Drug

Discovery, Development, and Translation was created in

2005 by the IOM’s Board on Health Sciences Policy to

provide a unique platform for dialogue and collaboration

among thought leaders and stakeholders in government,

academia, industry, foundations and patient advocacy.

The Forum brings together leaders from private sector

sponsors of biomedical and clinical research, federal

agencies sponsoring and regulating biomedical and

clinical research, the academic community, and

consumers, and in doing so serves to educate the policy

community about issues where science and policy

intersect.

The Forum convenes several times each year to

identify and discuss key problems and strategies in the

discovery, development, and translation of drugs. To

supplement the perspectives and expertise of its members,

the Drug Forum also holds public workshops to engage a

wide range of experts, members of the public, and the

policy community in discussing areas of concern in the

science and policy of drug development. The Forum’s

public meetings focus substantial public attention on

critical areas of drug development, focusing on the major

themes outlined below.

The Approach to Drug Development

Despite exciting scientific advances, the pathway

from basic science to new therapeutics faces challenges

on many fronts. New paradigms for discovering and

developing drugs are being sought to bridge the ever-

widening gap between scientific discoveries and

translation of those discoveries into life-changing

medications. The Forum has explored these issues from

many perspectives—emerging technology platforms,

regulatory efficiency, intellectual property concerns, the

potential for precompetitive collaboration, and innovative

business models that address the “valley of death.”

Strengthening the Scientific Basis of Drug Regulation

Over the past several years, the Forum has focused its

attention on the scientific basis for the regulation of drugs.

In February 2010, the Forum held a workshop that

examined the state of the science of drug regulation and

considered approaches for enhancing the scientific basis

of regulatory decision making.

Transforming Research and Fostering Collaborative

Research

The Forum has established an initiative to examine

the state of clinical trials in the U.S., identify areas of

strength and weakness in our current clinical trial

enterprise, and consider transformative strategies for

enhancing the ways in which clinical research is

organized and conducted. Workshops and meetings held

in 2009 and 2010 considered case studies in four disease

areas; and included discussions around issues of

management of conflict of interest, and addressing

regulatory and administrative impediments to the conduct

of clinical trials. Meetings in 2011 will address how to

move toward greater public engagement in and

understanding of the clinical trial enterprise, and

establishing a framework for a transformed national

clinical trial enterprise.

Developing Drugs for Rare and Neglected Diseases and

Addressing Urgent Global Health Problems

The Forum is sponsoring a series of workshops on the

global problem of MDR TB. The Forum held a

foundational workshop in Washington, DC in 2008, for

which it commissioned a paper from Partners In Health.

Additional workshops are being held in the four countries

with the highest MDR TB burden—South Africa and

Russia (held 2010), and India and China (anticipated

2011). Also in 2011, the Forum will convene a focused

initiative to consider the global drug supply chain for

quality-assured second-line drugs for tuberculosis.

Promoting Public Understanding of Drug

Development

Successful introduction of new therapeutic entities

requires testing in an informed and motivated public. The

Forum has spent concerted effort to understand what

limits public participation and how to enhance more

widespread acceptance of the importance of advancing

therapeutic development through public participation in

the drug development process. Forum meetings held in the

spring and fall of 2010 addressed these issues. The Forum

plans to continue to work with multiple stakeholders to

improve public understanding of and participation in the

drug development process.

500 Fifth Street, NW

Washington, DC 20001

E-mail: [email protected]

Phone: 202 334-2715

Fax: 202 334-1329

www.iom.edu/drug 1

Page 20: Sharing Clinical Research Data:  an IOM Workshop

The IOM was chartered in 1970 as a component of the National Academy of Sciences to enlist distinguished members of the appropriate professions in the

examination of policy matters pertaining to the health of the public. In this, the Institute acts under both the Academy’s 1863 congressional charter

responsibility to be an adviser to the federal government and its own initiative in identifying issues of medical care, research, and education.

FORUM ON DRUG DISCOVERY, DEVELOPMENT, AND TRANSLATION

JEFFREY DRAZEN, (Co-Chair) New England Journal of Medicine

STEVEN GALSON (Co-Chair)

Amgen Inc.

MARGARET ANDERSON

FasterCures

HUGH AUCHINCLOSS

National Institute of Allergy and Infectious Diseases

LESLIE BENET

University of California, San Francisco

ANN BONHAM

Association of American Medical Colleges

LINDA BRADY

National Institute of Mental Health

ROBERT CALIFF

Duke University Medical Center

THOMAS CASKEY Baylor College of Medicine

GAIL CASSELL

Harvard Medical School

PETER CORR

Celtic Therapeutics, LLLP

ANDREW DAHLEM

Eli Lilly & Co.

TAMARA (MARA) DARSOW

American Diabetes Association

JIM DOROSHOW

National Cancer Institute

GARY FILERMAN

Atlas Health Foundation

GARRET FITZGERALD

University of Pennsylvania School of Medicine

MARK GOLDBERGER

Abbott Pharmaceuticals

HARRY GREENBERG

Stanford University School of Medicine

STEPHEN GROFT Office of Rare Diseases Research, NIH NCATS

LYNN HUDSON

The Critical Path Institute

TOM INSEL

National Center for Advancing Translational Sciences

MICHAEL KATZ March of Dimes Foundation

PETRA KAUFFMAN

National Institute of Neurological Disorders and Stroke

JACK KEENE

Duke University Medical Center

RONALD KRALL

University of Pennsylvania

FREDA LEWIS-HALL

Pfizer Inc.

MARK MCCLELLAN

Brookings Institution

CAROL MIMURA

University of California, Berkeley

ELIZABETH (BETSY) MYERS Doris Duke Charitable Foundation

JOHN ORLOFF

Novartis Pharmaceuticals Corporation

AMY PATTERSON

NIH Office of the Director

MICHAEL ROSENBLATT

Merck & Co., Inc.

JANET SHOEMAKER

American Society for Microbiology

ELLEN SIGAL

Friends of Cancer Research

ELLIOTT SIGAL

Bristol-Myers Squibb

ELLEN STRAHLMAN GlaxoSmithKline

NANCY SUNG Burroughs Wellcome Fund

JANET TOBIAS

Ikana Media

JOANNE WALDSTREICHER

Janssen Research & Development

JANET WOODCOCK

FDA Center for Drug Evaluation and Research

PROJECT STAFF Anne Claiborne, J.D., M.P.H., Forum Director

Rebecca English, M.P.H., Associate Program Officer

Rita Guenther, Ph.D., Program Officer

Robin Guyse, Senior Program Assistant

Elizabeth Tyson, Research Associate

2

Page 21: Sharing Clinical Research Data:  an IOM Workshop

500 Fifth Street, NW Washington, DC 20001 E-mail:[email protected]

Phone: 202 334-3984

Fax: 202 334-1329

www.iom.edu/neuroforum

Board on Health Sciences Policy Forum on Neuroscience and Nervous System Disorders

FORUM ON NEUROSCIENCE AND NERVOUS SYSTEM DISORDERS

The Institute of Medicine’s Forum on Neuroscience

and Nervous System Disorders focuses on building

partnerships to further understand the brain and nervous

system, disorders in their structure and function, as well

as effective clinical prevention and treatment strategies.

The forum focuses on the six themes outlined below,

and serves to educate the public, press, and policy

makers regarding these issues.

The Forum brings together leaders from private

sector sponsors of biomedical and clinical research,

federal agencies sponsoring and regulating biomedical

and clinical research, the academic community, and

consumers.

The Forum will sponsor workshops for members

and the public to discuss approaches to resolve key

challenges identified by Forum members. It strives to

enhance understanding of research and clinical issues

associated with the nervous system among the scientific

community and the general public, and provide a

mechanism to foster partnerships among stakeholders.

Nervous System Disorders

The forum promotes partnerships which will lead

to a more complete understanding of the agents that

damage cells and trigger cell death, as well as the

mechanisms that carry out the process. For example,

protein aggregation has an important role in

neurodegeneration and is emerging as a common

mechanism in several nervous system disorders (e.g.

Parkinson’s and Alzheimer’s disease).

Mental Illness and Addiction

Improved understanding of the nervous system and

its role in mental illness and addictive behavior is

critical to developing new treatments for these

conditions. Especially important is improving the

understanding of the neuronal mechanisms that give

rise to mental illnesses, increased risk behavior, and

resilience. The Forum discusses mechanisms that

might take advantage of the evolving nature of science

and the opportunities it brings to investigating the brain

and behavior using cross disciplinary approaches that

expand the understanding of how complex mental

illnesses and addictions arise from individual

components.

Genetics of Nervous System Disorders

Understanding the roles of genes and their products

that control electrical activity, trophic factors,

transporters, and receptors is critical to understanding

the brain and nervous system. The role of genetics in

disease states needs to be more completely understood

to develop appropriate treatments and interventions.

The forum discusses approaches to better

understanding environmental and multigenetic factors

that influence susceptibility, progression, and severity

of disease.

Cognition and Behavior

Higher brain functions that give rise to language,

cognition, memory, and emotion depend on a complex

and interdependent set of neuronal pathways.

Understanding higher brain functions and their

impairment is critical to understanding effective

treatments for disease.

Modeling and Imaging

Tools that assist in better understanding of the basic

and higher order functions of the brain aid in research

and understanding of disorders of the brain. This area

focuses on animal and computer models and

neuroimaging approaches.

Ethical and Social Issues

The Forum addresses ethical issues relating to the

stigma associated with nervous system disorders,

mental illness, and addiction in society and the

engagement of these patient populations in research

efforts. Clinical research is also examined that

elucidates environmental, genetic, social, and cultural

risk and protective factors which can guide the

development of prevention and early intervention

strategies.

3

Page 22: Sharing Clinical Research Data:  an IOM Workshop

INSTITUTE OF MEDICINE

Forum on Neuroscience and Nervous System Disorders

STEVE HYMAN, (Chair)

The Broad Institute

SUSAN AMARA

Society for Neuroscience

MARC BARLOW

GE Healthcare, Inc.

MARK BEAR

Massachusetts Institute of Technology

KATJA BROSE

Neuron

DANIEL BURCH CeNeRx Biopharma

C. THOMAS CASKEY

Baylor College of Medicine

TIMOTHY COETZEE

Fast Forward, LLC

EMMELINE EDWARDS

NIH Neuroscience Blueprint

MARTHA FARAH

University of Pennsylvania

RICHARD FRANK GE Healthcare, Inc.

DANIEL GESCHWIND

University of California Los Angeles

HANK GREELY

Stanford University

MYRON GUTMANN

National Science Foundation

RICHARD HODES

National Institute on Aging

STEVEN HYMAN

The Broad Institute

THOMAS INSEL

National Institute on Mental Health

PHILLIP IREDALE

Pfizer Global Research and Development

DANIEL JAVITT

Nathan S. Kline Institute for Psychiatric Research

FRANCES JENSEN

University of Pennsylvania, School of Medicine

STORY LANDIS

National Institute on Neurological Disease and Stroke

ALAN LESHNER

American Association for the Advancement of Science

HUSSEINI MANJI

Johnson and Johnson Pharmaceutical Research and Development,

Inc.

DAVID MICHELSON

Merck Research Laboratories

RICHARD MOHS

Lilly Research Laboratories

JONATHAN MORENO University of Pennsylvania School of Medicine

ALEXANDER OMMAYA

U.S. Department of Veterans Affairs

ATUL PANDE

GlaxoSmithKline, Inc

STEVEN PAUL

Weill Cornell Medical College

TODD SHERER

Michael J. Fox Foundation for Parkinson’s Research

PAUL SIEVING

National Eye Institute

JUDY SIUCIAK

Foundation for the National Institutes of Health

MARC TESSIER-LAVIGNE

The Rockefeller University

WILLIAM THIES

Alzheimer’s Association

NORA VOLKOW

National Institute on Drug Abuse

KENNETH WARREN

National Institute on Alcohol Abuse and Alcoholism

JOHN WILLIAMS

Wellcome Trust

STEVIN H. ZORN

Lundbeck USA

CHARLES ZORUMSKI

Washington University School of Medicine

PROJECT STAFF

Bruce Altevogt, Ph.D., Forum Director

Diana Pankevich, Ph.D., Associate Program Officer

Elizabeth Thomas, Senior Project Assistant

______________________________________________________________________________________________

The IOM was chartered in 1970 as a component of the National Academy of Sciences to enlist distinguished members of the appropriate professions

in the examination of policy matters pertaining to the health of the public. In this, the Institute acts under both the Academy’s 1863 congressional

charter responsibility to be an adviser to the federal government and its own initiative in identifying issues of medical care, research, and education.

For additional information on the Forum on Neuroscience and Nervous System Disorders visit the project’s homepage at www.iom.edu/neuroforum,

or call Bruce Altevogt at (202) 334-3984.

4

Page 23: Sharing Clinical Research Data:  an IOM Workshop

500 Fifth Street, NW Phone: 202 334 3756 Washington, DC 20001-2721 Fax: 202 334 1329 www.iom.edu/genomicroundtable E-mail: [email protected]

ROUNDTABLE ON TRANSLATING GENOMIC-BASED RESEARCH FOR HEALTH

The sequencing of the human genome is rapidly

opening new doors to research and progress in

biology, medicine, and health care. At the same time,

these developments have produced a diversity of new

issues to be addressed.

The Institute of Medicine has convened a Roundtable

on Translating Genomic-Based Research for Health

that brings together leaders from academia, industry,

government, foundations and associations, and

representatives of patient and consumer interests who

have a mutual concern and interest in addressing the

issues surrounding the translation of genome-based

research for use in maintaining and improving health.

The mission of the Roundtable is to advance the field

of genomics and improve the translation of research

findings to health care, education, and policy. The

Roundtable will discuss the translation process,

identify challenges at various points in the process,

and discuss approaches to address those challenges.

The field of genomics and its translation involves

many disciplines, and takes place within different

economic, social, and cultural contexts, necessitating

a need for increased communication and

understanding across these fields. As a convening

mechanism for interested parties from diverse

perspectives to meet and discuss complex issues of

mutual concern in a neutral setting, the Roundtable:

fosters dialogue across sectors and institutions;

illuminates issues, but does not necessarily resolve

them; and fosters collaboration among stakeholders.

To achieve its objectives, the Roundtable conducts

structured discussions, workshops, and symposia.

Workshop summaries will be published and

collaborative efforts among members are encouraged

(e.g., journal articles). Specific issues and agenda

topics are determined by the Roundtable membership,

and span a broad range of issues relevant to the

translation process.

Issues may include the integration and coordination

of genomic information into health care and public

health including encompassing standards for genetic

screening and testing, improving information

technology for use in clinical decision making,

ensuring access while protecting privacy, and using

genomic information to reduce health disparities.

The patient and family perspective on the use of

genomic information for translation includes social

and behavioral issues for target populations. There

are evolving requirements for the health professional

community, and the need to be able to understand and

responsibly apply genomics to medicine and public

health.

Of increasing importance is the need to identify the

economic implications of using genome-based

research for health. Such issues include incentives,

cost-effectiveness, and sustainability.

Issues related to the developing science base are also

important in the translation process. Such issues

could include studies of gene-environment

interactions, as well as the implications of genomics

for complex disorders such as addiction, mental

illness, and chronic diseases.

Roundtable sponsors include federal agencies,

pharmaceutical companies, medical and scientific

associations, foundations, and patient/public

representatives. For more information about the

Roundtable on Translating Genomic-Based Research

for Health, please visit our website at www.iom.edu/

genomicroundtable or contact Adam Berger at 202-

334-3756, or by e-mail at [email protected].

5

Page 24: Sharing Clinical Research Data:  an IOM Workshop

The Institute of Medicine serves as adviser to the nation to improve health. Established in 1970 under the charter of the National

Academy of Sciences, the Institute of Medicine provides independent, objective, evidence-based advice to policy makers, health

professionals, the private sector, and the public. The mission of the Institute of Medicine embraces the health of people everywhere.

Institute of Medicine

Roundtable on Translating Genomic-Based Research for Health

Membership

Wylie Burke, M.D., Ph.D. (Co-Chair) University of Washington

Sharon Terry, M.A. (Co-Chair) Genetic Alliance

Naomi Aronson, Ph.D.

BlueCross/BlueShield Association

Euan Ashley, M.R.C.P., D.Phil., FACC, FAHA American Heart Association

Paul R. Billings, M.D., Ph.D.

Life Technologies Corporation

Bruce Blumberg, M.D.

Kaiser Permanente

Denise E. Bonds, M.D., M.P.H. National Heart, Lung, and Blood Institute

Philip J. Brooks, Ph.D.

Office of Rare Diseases Research

C. Thomas Caskey, M.D., FACP

Baylor College of Medicine

Sara Copeland, M.D. Health Resources and Services Administration

Victor Dzau, M.D.

Duke University Health System

W. Gregory Feero, M.D., Ph.D.

The Journal of the American Medical Association

Andrew N. Freedman, Ph.D.

National Cancer Institute

Geoffrey Ginsburg, M.D., Ph.D. Duke University

Jennifer Hobin, Ph.D. American Society of Human Genetics

Richard Hodes, M.D.

National Institute on Aging

Sharon Kardia, Ph.D. University of Michigan, School of Public Health

Mohamed Khan, M.D., Ph.D. American Medical Association

Muin Khoury, M.D., Ph.D. Centers for Disease Control and Prevention

Thomas Lehner, Ph.D., M.P.H.

National Institute of Mental Health

Debra Leonard, M.D., Ph.D. College of American Pathologists

Elizabeth Mansfield, Ph.D. Food & Drug Administration

Garry Neil, M.D.

Johnson & Johnson

Robert L. Nussbaum, M.D. University of California San Francisco,

School of Medicine

Michelle Ann Penny, Ph.D. Eli Lilly and Company

Aidan Power, M.B., B.Ch., M.Sc., M.R.C.Psych.

Pfizer Inc.

Victoria M. Pratt, Ph.D., FACMG

Quest Diagnostics Nichols Institute

Ronald Przygodzki, M.D. Department of Veterans Affairs

Allen D. Roses, M.D. Duke University

Kevin A. Schulman, M.D. Duke University

Joan A. Scott, M.S., C.G.C.

National Coalition for Health Professional Education

in Genetics

David Veenstra, Pharm.D., Ph.D.

University of Washington

Michael S. Watson, Ph.D. American College of Medical Genetics and Genomics

Daniel Wattendorf, M.D. (Lt. Col)

Department of the Air Force

Catherine A. Wicklund, M.S., C.G.C.

National Society of Genetic Counselors

Project Staff Adam C. Berger, Ph.D., Roundtable Director

Sean P. David, M.D., D.Phil., Puffer/ABFM Fellow

Claire Giammaria, M.P.H., Research Associate

Tonia Dickerson, Senior Program Assistant

6

Page 25: Sharing Clinical Research Data:  an IOM Workshop

500 Fifth Street, NW, Washington, DC 20001 Ph: 202-334-1233 Fax: 202-334-2862

www.iom.edu/Activities/Disease/NCPF

The National Cancer Policy Forum

The National Cancer Policy Forum (NCPF) provides a continuous focus on cancer policy

at the Institute of Medicine (IOM). IOM forums are designed to allow government,

industry, academic, and other representatives to meet, confer, and plan on subject areas

of mutual interest. The objectives of the NCPF are to identify emerging high priority

policy issues in the nation’s effort to combat cancer and to examine those issues through

convening activities that promote discussion about potential opportunities for action.

These activities inform stakeholders about critical policy issues through published

reports, and often provide input for planning formal IOM consensus committee studies.

The IOM established the NCPF on May 1, 2005, to succeed the National Cancer Policy

Board (1997-2005, the Board). The Board brought together leaders from the cancer

community to identify and conduct studies and other activities contributing to cancer

research, prevention, treatment, and public awareness. The combination of multi-

disciplinary expertise (basic, clinical, and public health scientists, consumers, and

advocates) and resources (grants from the National Cancer Institute (NCI) and Centers

for Disease Control and Prevention (CDC), as well as smaller contributions from private

sector organizations) allowed the Board to produce a remarkably original and diverse

body of work contributing to improvements in knowledge and public policy.

The NCPF continues to provide a focus within the National Academies for the

consideration of issues in science, clinical medicine, public health, and public policy

relevant to the goals of preventing, palliating, and curing cancer. The NCPF builds upon

the work of the Board and enjoys a closer working relationship with its federal and non-

federal sponsors. As a forum rather than a board, sponsors are full members with the

academic, consumer, and policy community members. They bring ideas and requests to

the deliberations and have the advantage of playing an active part in the discussions.

Governmental sponsors represented on the NCPF include the NCI and CDC, and non-

governmental sponsors include the American Association for Cancer Research, the

American Cancer Society, the American Society of Clinical Oncology, the Association of

American Cancer Institutes, Bristol-Myers Squibb, C-Change, the CEO Roundtable on

Cancer, GlaxoSmithKline Oncology, Novartis, the Oncology Nursing Society, and

Sanofi Oncology. Additional distinguished experts from the cancer community are also

appointed as members to serve three-year terms. The NCPF operates under the aegis of

the IOM Board on Health Care Services.

The NCPF enables all members to be full participants in identifying and debating critical

policy issues in cancer care and research, and in examining potential opportunities for

actions. These convening activities result in published reports that are available to the

public and may provide input to planning formal IOM consensus committee studies.

Ideas for committee studies that emerge from the NCPF’s deliberations are handed off to

an appropriate ad hoc committee appointed by IOM. The studies are conducted by NCPF

staff and the committees often include one or more members of the NCPF. Forum

sponsors often actively pursue activities to facilitate the implementation of

recommendations made in these consensus reports, as well as suggestions put forth in

NCPF workshops.

7

Page 26: Sharing Clinical Research Data:  an IOM Workshop

National Cancer Policy Forum Reports: Developing Biomarker-based Tools for Cancer Screening, Diagnosis and Therapy (2006), Effect of

the HIPAA Privacy Rule on Health Research (2006), Implementing Cancer Survivorship Care

Planning (2007), Cancer in Elderly People (2007), Cancer-Related Genetic Counseling and Testing

(2007), Improving the Quality of Cancer Clinical Trials (2008), Implementing Colorectal Cancer

Screening (2008), Multi-center Phase III Clinical Trials and the NCI Cooperative Group Program

(2009), Ensuring Quality Cancer Care through the Oncology Workforce (2009), Assessing and

Improving Value in Cancer Care (2009), Policy Issues in the Development of Personalized Medicine

in Oncology (2010), A Foundation for Evidence-Driven Practice: A Rapid Learning System for

Cancer Care (2010), Extending the Spectrum of Precompetitive Collaboration in Oncology Research

(2010), Direct to Consumer Genetic Testing (with the NRC, 2010), Policy Issues in Nanotechnology

and Oncology (2011), National Cancer Policy Summit: Opportunities and Challenges in Cancer

Research and Care (2011), Patient-Centered Cancer Treatment Planning: Improving the Quality of

Oncology Care (2011), Implementing a National Cancer Clinical Trials System for the 21st Century

(2011), Facilitating Collaborations to Develop Combination Investigational Cancer Therapies

(2011), The Role of Obesity in Cancer Survival and Recurrence (2012), Informatics Needs and

Challenges in Cancer Research (2012), Reducing Tobacco-Related Cancer Incidence and Mortality

(In Preparation)

Spin-off IOM consensus committee reports: Cancer Biomarkers: The Promises and Challenges of Improving Detection and Treatment (2007),

Beyond the HIPAA Privacy Rule: Enhancing Privacy, Improving Health through Research (2009),

Evaluation of Biomarkers and Surrogate Endpoints in Chronic Disease (2010), A National Cancer

Clinical Trials System for the 21st Century (2010), Evolution of Translational Omics (2012), Ensuring

the Quality of Cancer Care in an Aging Population (In preparation)

Membership of the Forum: Chair - John Mendelsohn, MD, President, MD Anderson Cancer Center

Vice Chair - Patricia Ganz, MD, Professor of Medicine, Jonsson Comprehensive Cancer Center

Amy Abernethy, MD, Director, Duke University School of Medicine Cancer Care Research Program

Rafael Amado, MD, Senior Vice President and Head of GlaxoSmithKline Oncology R&D

Fred Appelbaum, MD, Director, Clinical Research, Fred Hutchinson Cancer Research Center

Peter Bach, MD, MAPP, Member, Memorial Sloan-Kettering Cancer Center

Edward Benz, MD, President, Dana-Farber Cancer Institute

Monica Bertagnolli, MD, Professor of Surgery, Harvard University Medical School

Otis Brawley, MD, Chief Medical Officer and Executive Vice President, American Cancer Society

Michael Caligiuri, MD, Director, Ohio State University Cancer Center, past President, AACI

Renzo Canetta, MD, Vice President, Oncology Global Clinical Research, Bristol-Myers Squibb

Michaele Chamblee Christian, MD, Retired

William Dalton, PhD, MD, CEO, M2Gen, H. Lee Moffitt Cancer Center

Wendy Demark-Wahnefried, PhD, RD, Professor and Chair, University of Alabama, Birmingham

Robert Erwin, MS, President, Marti Nelson Cancer Foundation

Roy Herbst, MD, PhD, Chief of Medical Oncology, Yale Cancer Center

Thomas Kean, MPH, President and CEO, C-Change

Douglas Lowy, MD, Deputy Director, Division of Basic Science, NCI

Daniel R. Masys, MD, Affiliate Professor, Biomedical Informatics, University of Washington

Martin Murphy, PhD, DMedSc, Chief Executive Officer, CEO Roundtable on Cancer

Brenda Nevidjon, RN, MSN, Professor, Duke University School of Nursing, past President, ONS

Steven Piantadosi, MD, PhD, Director, Samuel Oschin Comprehensive Cancer Institute

Lisa Richardson, MD, MPH, Associate Director, Division of Cancer Prevention and Control, CDC

Debasish Roychowdhury, MD, Senior Vice President- Global Oncology, Sanofi

Ya-Chen Tina Shih, PhD, Director, Program in the Economics of Cancer, University of Chicago

Ellen Sigal, PhD, Chairperson and Founder, Friends of Cancer Research

Steven Stein, MD, Senior Vice President, US Clinical Development & Medical Affairs, Novartis

John Wagner, MD, PhD, Vice President, Clinical Pharmacology, Merck Research Laboratories

Ralph Weichselbaum, MD, Chairman, Radiation and Cellular Oncology, University of Chicago

Janet Woodcock, MD, Director, Center for Drug Evaluation and Research, FDA

Forum Director: Sharyl Nass, PhD 8

Page 27: Sharing Clinical Research Data:  an IOM Workshop

WORKSHOP

Page 28: Sharing Clinical Research Data:  an IOM Workshop

Forum on Drug Discovery, Development, and Translation Forum on Neuroscience and Nervous System Disorders

Roundtable on Translating Genomic-Based Research for Health National Cancer Policy Forum

Speaker Biographies

Sharing Clinical Research Data: An IOM Workshop

October 4-5, 2012

Myles Axton, Ph.D. Myles Axton is the editor of Nature Genetics. He was a university lecturer in molecular and cellular biology at the University of Oxford and a Fellow of Balliol College from 1995 to 2003. He obtained his degree in genetics at Cambridge in 1985, and his doctorate at Imperial College in 1990, and between 1990 and 1995 did postdoctoral research at Dundee and at MIT’s Whitehead Institute. Myles’s research made use of the advanced genetics of Drosophila to study genome stability by examining the roles of cell cycle regulators in life cycle transitions. His interests broadened into human genetics, genomics and systems biology through lecturing and from tutoring biochemists, zoologists and medical students from primary research papers. Helping to establish Oxford’s innovative research MSc. in Integrative Biosciences led Myles to realize the importance of the integrative overview of biomedical research. As a full time professional editor he is now in a position to use this perspective to help coordinate research in genetics.

Jesse A. Berlin, Sc.D. After spending 15 years as a faculty member at the University of Pennsylvania, in the Center for Clinical Epidemiology and Biostatistics, under the direction of Dr. Brian Strom, Dr. Berlin left the University of Pennsylvania to join Janssen Research & Development, where he is currently Vice President of Epidemiology. He has authored or coauthored over 230 publications in a wide variety of clinical and methodological areas, including papers on the study of meta-analytic methods as applied to both randomized trials and epidemiology. He served on an Institute of Medicine Committee that developed recently-released recommendations for the use of systematic reviews in clinical effectiveness research, and currently serves on the Scientific Advisory Committee to the Observational Medical Outcomes Partnership, a public-private partnership aimed at understanding methodology for assessing drug safety in large, administrative databases. He is also a fellow of the American Statistical Association. Josephine P. Briggs, M.D. Dr. Briggs is the Director of the National Center for Complementary and Alternative Medicine, and Acting Director, Division of Clinical Innovation, National Center for Advancing Translational Sciences, NIH. An accomplished researcher and physician, Dr. Briggs received her A.B. in biology from Harvard-Radcliffe College and her M.D. from Harvard Medical School. She completed her residency training in internal medicine and nephrology at the Mount Sinai School of Medicine, followed by a fellowship at Yale, then work as a research scientist at the Physiology Institute at the University of Munich. In 1985, Dr. Briggs moved to the University of Michigan where she held several academic positions, including associate chair for research in the Department of Internal Medicine and professorships in the Division of Nephrology, Department of Internal Medicine, and the Department of Physiology. She joined the National Institutes of Health (NIH) in 1997 as director of the Division of Kidney, Urologic, and Hematologic Diseases at the National Institute of Diabetes and Digestive and Kidney Diseases. In 2006, Dr. Briggs accepted a position as senior

1

Page 29: Sharing Clinical Research Data:  an IOM Workshop

scientific officer at the Howard Hughes Medical Institute. In January 2008, she returned to NIH as the Director of the National Center for Complementary and Alternative Medicine. Dr. Briggs has published more than 175 research articles, book chapter, and scholarly publications and has served on the editorial boards of several journals, and was deputy editor for the Journal of Clinical Investigation. Dr. Briggs is an elected member of the American Association of Physicians and the American Society of Clinical Investigation and a fellow of the American Association for the Advancement of Science. She is a recipient of many awards and prizes, including the Volhard Prize of the German Nephrological Society, the Alexander von Humboldt Scientific Exchange Award, and NIH Director's Awards for her role in the development of the Trans-NIH Type I Diabetes Strategic Plan and her leadership of the Trans-NIH Zebrafish committee. Dr. Briggs is also a member of the NIH Steering Committee, the senior most governing board at the NIH. Robert M. Califf, M.D. Dr. Califf is the Vice Chancellor for Clinical and Translational Research, Director of the Duke Translational Medicine Institute (DTMI), and Professor of Medicine in the Division of Cardiology at Duke University Medical Center in Durham, NC. He leads a multifaceted organization that seeks to transform how scientific discoveries are translated into improved health outcomes. Prior to leading the DTMI, he was the founding director of the Duke Clinical Research Institute (DCRI), one of the nation's premier academic research organizations. He is editor-in-chief of the American Heart Journal, the oldest cardiovascular specialty journal, and a practicing cardiologist at Duke University Medical Center. Born in 1951 in Anderson, SC, Dr. Califf attended high school in Columbia, where he was a member of the 1969 AAAA SC Championship basketball team. After high school, Dr. Califf attended Duke University, graduating summa cum laude and Phi Beta Kappa in 1973. He remained at Duke for medical school, where he was selected for the Alpha Omega Alpha medical honor society. After graduating from Duke University School of Medicine in 1978, he completed a residency in internal medicine at the University of California, San Francisco, returning to Duke for a cardiology fellowship. Dr. Califf is board certified in internal medicine (1984) and cardiology (1986), and was named a Master of the American College of Cardiology in 2006. An international leader in the fields of cardiovascular medicine, healthcare outcomes, quality of care, and medical economics, he has authored or coauthored more than 1,000 peer-reviewed articles and is among the most frequently cited authors in medicine. He is also a contributing editor for TheHeart.org, an online information resource for healthcare professionals working in the field of cardiovascular medicine. As founder and for a decade director of the DCRI, Dr. Califf led many landmark clinical trials and health services research projects, and remains actively involved in designing, leading, and conducting multinational clinical trials. Under his guidance, DCRI grew into an organization with more than 1000 employees and an annual budget of over $1 00 million; its umbrella organization, the DTMI, now has an annual budget of more than $300 million. Supported in part by a Clinical and Translational Science Award (CTSA) from the National Institutes of Health, the DTMI works with government agencies, academic partners, research foundations, and the medical products industry to conduct innovative research spanning multiple therapeutic arenas and scientific disciplines.

2

Page 30: Sharing Clinical Research Data:  an IOM Workshop

Forum on Drug Discovery, Development, and Translation Forum on Neuroscience and Nervous System Disorders

Roundtable on Translating Genomic-Based Research for Health National Cancer Policy Forum

Dr. Califf serves as co-chair of the first Principal Investigators Steering Committee of the CTSA and has served on the FDA's Cardiorenal Advisory Panel, and on the Institute of Medicine's (IOM) Pharmaceutical Roundtable, Committee on Identifying and Preventing Medication Errors, and Committee on Nutritional Biomarkers. In 2008, he was part of the subcommittee of the FDA's Science Board that recommended sweeping reform ofthe agency's science base. He was also a member of the IOM committees that recommended Medicare coverage of clinical trials and the removal of ephedra from the market. Dr. Califf is currently a member ofthe IOM Forum in Drug Discovery, Development, and Translation and a member of the National Advisory Council on Aging. Reflecting his interests in healthcare quality, Dr. Califf was the founding director of the coordinating center of the Centers for Education & Research on Therapeutics, a public-private partnership that seeks to improve the use of medical products through research and education. He is currently co-chair of the Clinical Trials Transformation Initiative, a public-private partnership focused on improving the clinical trials system. He is also chair of the Clinical Research Forum, an organization of academic health and science system leaders devoted to improving the clinical research enterprise. Dr. Califf is married to Lydia Carpenter, and the couple has three children-Sharon Califf, a graduate of Elon College; Sam, a graduate student at the University of Colorado-Boulder; and Tom, a graduate of Duke-and one grandchild. Dr. Califf enjoys time with his family, works at his golf game, listens to music, and remains an ardent supporter of the Duke men's and women's basketball programs. Michael N. Cantor, M.D., M.A., FACP Michael Cantor is currently Senior Director, Information Strategy and Analytics, in Pfizer’s Clinical Informatics and Innovation group. His work focuses on leveraging data reuse and integration to support future horizons of scientific decision support for precision medicine. He is currently co-leading several initiatives around the secondary use of clinical data, including Pfizer’s ePlacebo/eControls database, as well as its comprehensive Clinical Lab Data Catalog. He created and co-leads the MEDIC (Multisite Electronic Data Infectious Disease Consortium) project, which aims to partner with academic medical centers to perform observational studies using data from electronic medical record (EMR) systems. He has served as an advisor to programs across each of Pfizer’s Business Units, as well as the Worldwide Research and Development organization, on the role of healthcare IT in advancing their strategic priorities. Dr. Cantor previously lead Pfizer Business Technology’s “Data Without Borders” strategy, with the aim of advancing data sharing and reuse, both internally and externally, to advance Precision Medicine. Outside of Pfizer, he has been a member of AMIA’s public policy committee for six years, and led the committee’s initiative to update its positions around data stewardship and reuse. Prior to joining Pfizer, Michael was the Chief Medical Information Officer for the South Manhattan Healthcare Network of the New York City Health and Hospitals Corporation, based at Bellevue Hospital in Manhattan. His work there focused on developing the network’s EMR system to improve patient safety and on using the network’s clinical data warehouse for research. He continues to see patients 1 day/week at Bellevue, and is a Clinical Assistant Professor of Medicine at NYU School of Medicine.

3

Page 31: Sharing Clinical Research Data:  an IOM Workshop

Michael completed his residency in internal medicine and informatics training at Columbia, has an M.D. from Emory University, and an A.B. from Princeton. Carolyn Compton, M.D., Ph.D. Dr. Carolyn Compton is the President and CEO of Critical Path Institute. She was most recently the Director of the Office of Biorepositories and Biospecimen Research (OBBR) and the Executive Director of the Cancer Human Biobank (caHUB) project at the National Cancer Institute. In these capacities, she had leadership responsibility for strategic initiatives that included the Innovative Molecular Analysis Technologies for Cancer program, the Biospecimen Research Network program, and the NCI Community Cancer Centers project. She is an adjunct Professor of Pathology at the Johns Hopkins School of Medicine. She received her M.D. and Ph.D. in degrees from Harvard Medical School and the Harvard Graduate School of Arts and Sciences. She trained in Pathology at Harvard's Brigham and Women's Hospital and is boarded in both Anatomic Pathology and Clinical Pathology. She came to the NCI from McGill University where she had been the Strathcona Professor and Chair of Pathology and the Pathologist-in-Chief of McGill University Health Center from 2000-2005. Prior to this, she had been a Professor of Pathology Harvard Medical School, the Director of Gastrointestinal Pathology at the Massachusetts General Hospital, and the Pathologist-in-Chief of the Shriners' Hospital For Crippled Children, Boston Burns Unit for 15 years. During this time she served as the Chair of the Pathology Committee of the Cancer and Leukemia Group B for 12 years. Her research interests are in colon and pancreatic cancer as well as epithelial biology and wound healing. Dr. Compton has held many national and international leadership positions in pathology and cancer-related professional organizations. She is a Fellow of the College of American Pathologists and a Fellow of the Royal Society of Medicine. Currently, she is the Chair of the American Joint Committee on Cancer (AJCC), serves on the Executive Committee of the Commission on Cancer of the American College of Surgeons, and serves as the Pathology Section Editor for Cancer. She is a past Chair of the Cancer Committee of the College of American Pathologists and was Editor of the first edition of the CAP Cancer Protocols (Reporting on Cancer Specimens) used as standards for COC accreditation. Among her awards are the ISBER Award for Outstanding Achievement in Biobanking, the NIH Director's Award, the NIH Award of Merit, and the CAP Frank W. Hartman Award. She has published more than 500 original scientific papers, reports, review articles, books and abstracts. Neil de Crescenzo, M.B.A. Mr. de Crescenzo is Senior Vice President and General Manager for Health Sciences at Oracle. He is responsible for managing Oracle's solution groups, strategic planning, product development, and sales, service, and support for the industry solutions sold into the healthcare and life sciences markets worldwide. Mr. de Crescenzo brings more than 20 years of operational and IT leadership across healthcare and life sciences to his work with customers and partners worldwide. Prior to joining Oracle, Mr. de Crescenzo held a number of leadership positions during his decade at IBM Corporation, working with healthcare and life sciences clients worldwide. Prior to entering the information technology industry, he held leadership positions in healthcare operations at multiple medical centers and a major health insurer. Mr. de Crescenzo began his career in investment banking, working with U.S. and European clients in the areas of corporate finance and mergers and acquisitions.

4

Page 32: Sharing Clinical Research Data:  an IOM Workshop

Forum on Drug Discovery, Development, and Translation Forum on Neuroscience and Nervous System Disorders

Roundtable on Translating Genomic-Based Research for Health National Cancer Policy Forum

Mr. de Crescenzo has been a keynote speaker at numerous industry conferences worldwide and is quoted frequently on industry issues. In 2005, he was named one of the "Top 25 Most Influential Consultants" by Consulting Magazine. Mr. de Crescenzo has a BA in political science from Yale University and an MBA in high technology from Northeastern University. Peter Doshi, Ph.D. Dr. Doshi is a postdoctoral fellow in comparative effectiveness research at the Johns Hopkins University School of Medicine. His overarching research interests are in improving the bases for credible evidence synthesis to support and improve the quality of evidence-based medical and health policy related decision making. In 2009, he joined a Cochrane systematic review team evaluating neuraminidase inhibitors for the treatment and prevention of influenza. Rather than focusing on publications, their review evaluates regulatory information including clinical study reports. He received his A.B. in anthropology from Brown University, A.M. in East Asian studies from Harvard University, and Ph.D. in History, Anthropology, and Science, Technology and Society from the Massachusetts Institute of Technology. Kelly Edwards, Ph.D. An associate professor in the School of Medicine's Department of Bioethics and Humanities, Dr. Edwards also is a core faculty member for the UW Institute for Public Health Genetics. She received both her Master of Arts degree in Medical Ethics and her Ph.D. in Philosophy of Education from the UW. Dr. Edwards' work incorporates communication and public engagement as an ethical obligation for clinicians and researchers. She is the director of the Ethics and Outreach Core for the UW Center for Ecogenetics and Environmental Health, which is funded by the National Institute of Environmental Health Sciences. She also is a co-director of the Regulatory Support and Bioethics Core for the Institute for Translational Health Sciences (ITHS), a partnership of the UW, Fred Hutchinson Cancer Research Center, Seattle Children's and other regional institutions and community and tribal groups. Funded by the National Institutes of Health (NIH), the ITHS assists researchers with translating their scientific discoveries into practice. In addition, Dr. Edwards is a lead investigator with UW Center for Genomics and Healthcare Equality, funded by the NIH's National Human Genome Research Institute. Since 2004, she has been the faculty advisor for the Forum on Science, Ethics and Policy, groups of graduate and professional students and postdoctoral fellows at the UW and University of Colorado that promote dialogue on issues concerning science and society. To further engage people in conversations about ethical dimensions of science and medicine, Dr. Edwards has facilitated Community Conversations and the Public Health Café, a series of events hosted in Seattle by the Northwest Association for Biomedical Research. Nationally, Dr. Edwards contributes to issues of ethical research practices with the Genetic Alliance, a health advocacy organization; Sage Bionetworks, a local non-profit; and the Institute of Medicine. Her courses include "Inquiry-Based Science Communication," "Applied Research Ethics," "Community-Based Participatory Research: A Model for Genetics Research with Native

5

Page 33: Sharing Clinical Research Data:  an IOM Workshop

American Communities?" and "Public Commentary on Ethical Issues in Public Health Genetics." She is associate editor of BMC Medical Research Methodology and a reviewer for several journals. Dr. Edwards serves on the School of Medicine's Continuous Professional Improvement Committee and is a former member of Medicine's Standing Committee on Issues of Women Faculty, the Student Progress Committee and the Committee on Research and Graduate Education. She is a current member of the UW Graduate School Committee on Interdisciplinary Education. Hans-Georg Eichler, M.D., M.Sc. Dr. Eichler is the Senior Medical Officer at the European Medicines Agency in London, United Kingdom, where he is responsible for coordinating activities between the Agency’s scientific committees and giving advice on scientific and public health issues. From January until December 2011, Dr. Eichler was the Robert E. Wilhelm fellow at the Massachusetts Institute of Technology’s Center for International Studies, participating in a joint research project under the MIT’s NEWDIGS initiative. He divided his time between the MIT and the EMA in London. Prior to joining the European Medicines Agency, Dr. Eichler was at the Medical University of Vienna in Austria for 15 years. He was vice-rector for Research and International Relations since 2003, and professor and chair of the Department of Clinical Pharmacology since 1992. His other previous positions include president of the Vienna School of Clinical Research and co-chair of the Committee on Reimbursement of Drugs of the Austrian Social Security Association. His industry experience includes time spent at Ciba-Geigy Research Labs, U.K., and Outcomes Research at Merck & Co., in New Jersey. Dr. Eichler graduated with an M.D. from Vienna University Medical School and a Master of Science degree in Toxicology from the University of Surrey in Guildford, U.K. He trained in internal medicine and clinical pharmacology at the Vienna University Hospital as well as at Stanford University. Jennifer S. Geetter, J.D. Ms. Geetter is a partner in the law firm of McDermott Will & Emery LLP and is based in the Firm's Washington, D.C., office. She focuses her practice on emerging biotechnology and safety issues, advising hospital, industry, insurance and provider clients on matters relating to research, drug and device development, off-label use, personalized medicine, formulary compliance, privacy and security, electronic health records and data strategy initiatives, patient safety, conflicts of interest, scientific review and research misconduct, internal hospital disciplinary proceedings, and emerging issues in secondary research concerning biological samples and data warehousing. She also assists health care clients in implementing research strategies, structuring research operational and compliance infrastructure, and in developing guidelines for the appropriate relationships between providers and industry. She is a frequent speaker on these topics. Ms. Geetter a member of the Firm’s Life Sciences Affinity Group and Personalized Medicine Team. She is also a co-chair of the Pro Bono and Community Service Committee for the Washington, D.C., office. She sits on McDermott’s National Pro Bono Committee and the American Health Lawyers Association Life Sciences Section Steering Committee. Ms. Geetter is listed as a leading individual in health care in Washington, D.C., in Chambers USA 2008: America’s Leading Lawyers for Business.

6

Page 34: Sharing Clinical Research Data:  an IOM Workshop

Forum on Drug Discovery, Development, and Translation Forum on Neuroscience and Nervous System Disorders

Roundtable on Translating Genomic-Based Research for Health National Cancer Policy Forum

She was recognized as a STAR Mentor in the McDermott University Mentoring Program for 2006-2007. She is also a member of McDermott’s Gender Diversity Committee. In May 2003, Ms. Geetter was awarded the ACE Founders Award for her pro bono efforts as part of a team of McDermott lawyers on behalf of a group of low-income residents of a Boston neighborhood. She received the Firm's Outstanding Achievement Award for Commitment to Pro Bono and Service to the Community in 2004 and serves on the Firm's national coordinating committee for pro bono activities. Ms. Geetter is admitted to practice in Massachusetts, New York and Washington, D.C. Steven N. Goodman, M.D., M.H.S., Ph.D. Dr. Goodman is Professor of Medicine and Health Policy & Research and Associate Dean for Clinical and Translational Research at Stanford University. Before joining Stanford in 2011, Dr. Goodman spent two decades on the Johns Hopkins medical faculty as Professor of Oncology in the division of biostatistics, with appointments in the departments of Pediatrics, Biostatistics, and Epidemiology in the Johns Hopkins Schools of Medicine and Public Health. He has been editor of Clinical Trials: Journal of the Society for Clinical Trials since 2004 and senior statistical editor for the Annals of Internal Medicine since 1987. He has served on wide range of Institute of Medicine committees, including Agent Orange and Veterans, Immunization Safety, the Committee on Alternatives to the Daubert Standards, Treatment of PTSD in Veterans, and most recently co-chaired the Committee on Ethical and Scientific Issues in Studying the Safety of Approved Drugs, whose report was released in May, 2012. Dr. Goodman was appointed by the GAO to serve on the Methodology Committee of the Patient Centered Outcomes Research Institute (PCORI) and is a scientific advisor to the Medical Advisory Panel of the National Blue Cross/Blue Shield Technology Evaluation Center. He served on the Surgeon General’s committee to write the 2004 report on the Health Consequences of Smoking. Dr. Goodman received a BA from Harvard, an MD from New York University, trained at Washington University in Pediatrics, in which he was board certified, and received an MHS in Biostatistics, and PhD in Epidemiology from Johns Hopkins University. He writes and teaches on evidence evaluation and inferential, methodologic, and ethical issues in epidemiology and clinical research. Robert A. Harrington, M.D. Dr. Robert Harrington is Arthur L. Bloomfield Professor of Medicine and Chairman of the Department of Medicine at Stanford University. He received his undergraduate degree in English from the College of the Holy Cross Worcester, Mass. He attended Dartmouth Medical School and received his medical degree from Tuft University School of Medicine in 1986. He was an intern, resident, and the chief medical resident in internal medicine at the University of Massachusetts Medical Center. He was a fellow in cardiology at Duke University Medical Center, where he received training in interventional cardiology and research training in the Duke Databank for Cardiovascular Diseases. Dr. Harrington was previously the director of the Duke Clinical Research Institute (DCRI). His research interests include evaluating antithrombotic therapies to treat acute ischemic heart disease and to minimize the acute complications of percutaneous coronary procedures, studying the mechanism of disease of the acute coronary syndromes, understanding the issue of risk stratification in the care of patients with acute ischemic coronary syndromes, trying to better understand and improve upon the methodology of clinical trials. He is the recipient of an NIH Roadmap contract to investigate “best practices” among clinical trial networks.

7

Page 35: Sharing Clinical Research Data:  an IOM Workshop

He has authored more than 400 peer-reviewed manuscripts, reviews, book chapters, and editorials. He is an associate editor of the American Heart Journal and an editorial board member for the JACC. He is a senior editor of the 13th edition of Hurst’s The Heart. He is a Fellow of the ACC, the AHA, the ESC, the Society of Cardiovascular Angiography and Intervention, and the American College of Chest Physicians. He recently served as a member and the chair of the Food and Drug Administration Cardiovascular and Renal Drugs Advisory Committee.

Lynn D. Hudson, Ph.D. Dr. Hudson serves as the Chief Science Officer for the Critical Path Institute and Executive Director of the Multiple Sclerosis Consortium. She received a BS in Biochemistry at the University of Wisconsin, a Ph.D. in Genetics and Cell Biology at the University of Minnesota, and post-doctoral training at Harvard Medical School and Brown University. For most of her career, Lynn has been a bench neuroscientist at the National Institutes of Health (NIH), where she directed the Office of Science Policy Analysis from 2006-2011. As a major source for policy analysis within NIH's Office of the Director, her office covered a wide spectrum of sensitive and emerging issues and oversaw a number of programs, including the AAAS/NIH Science Policy Fellowship program, NIH’s contract with the National Academy of Sciences, and the Public-Private Partnership Program. Her policy team's awards cite contributions to NIH's Congressional Justification, Biennial Report, implementation of the NIH Reform Act, Stem Cell Guidelines, and Comparative Effectiveness Research. Lynn has conducted research in the National Institute of Neurological Disorders and Stroke (NINDS) intramural program, where she was Chief of the Section of Developmental Genetics for 16 years. She received the NIH Merit Award for her discovery of the causative mutations in the neurologic disorder Pelizaeus-Merzbacher disease (PMD), and an NINDS Award for educational outreach efforts. Her research focused on defining the network of genes involved in the development of glial cells, with the goal of designing strategies to overcome glial dysfunction in inherited or acquired neurological diseases. She served as an officer for the American Society for Neurochemistry, an officer on the scientific advisory board of the PMD Foundation, and as an advisor for a number of granting agencies and disease foundations, including the National Multiple Sclerosis Society. Charles Hugh-Jones, M.D. Dr. Hugh-Jones is Vice President and Head, Medical Affairs North America, Oncology, Hematology, and Solid Organ Transplant at Sanofi. He has previously served as Vice President and Head, Oncology Medical Affairs, North America for sanofi-aventis; Vice President of Brand Management and Vice President of Medical Affairs at Enzon Pharmaceuticals; Executive Director, Medical Affairs at Schering AG/Berlex/Bayer-Schering; Director of Medical Affairs for Schering AG, Global Business Unit; Senior Medical Advisor for Schering AG, UK; and Specialist Registrar at Hammersmith Hospital, UK. Dr. Hugh-Jones received his medical degree from the University of London. He is Board Certified in Internal Medicine and has Additional Higher Specialist Training (UK) in Diagnostic and Interventional Radiology.

8

Page 36: Sharing Clinical Research Data:  an IOM Workshop

Forum on Drug Discovery, Development, and Translation Forum on Neuroscience and Nervous System Disorders

Roundtable on Translating Genomic-Based Research for Health National Cancer Policy Forum

John P.A. Ioannidis, D.Sc., M.D. Dr. Ioannidis currently holds the C.F. Rehnborg Chair in Disease Prevention at Stanford University and is Professor of Medicine, Professor of Health Research and Policy, and Director of the Stanford Prevention Research Center at Stanford University School of Medicine, Professor of Statistics (by courtesy) at Stanford University School of Humanities and Sciences, member of the Stanford Cancer Center and of the Stanford Cardiovascular Institute, and affiliated faculty of the Woods Institute for the Environment. From 1999 until 2010 he chaired the Department of Hygiene and Epidemiology at the University of Ioannina School of Medicine in Greece, as a tenured professor since 2003. Dr. Ioannidis was born in New York, NY in 1965 and grew up in Athens, Greece. He was Valedictorian of his class (1984) at Athens College and won a number of early awards, including the National Award of the Greek Mathematical Society, in 1984. He graduated in the top rank of his class from the School of Medicine, University of Athens, in 1990 and earned a doctorate in biopathology. He trained at Harvard and Tufts, specializing in Internal Medicine and Infectious Diseases, and then held positions at NIH, Johns Hopkins University School of Medicine and Tufts University School of Medicine before returning to Greece in 1999. He has been adjunct faculty for the Tufts University School of Medicine since 1996, with the rank of professor since 2002. Since 2008 he led the Genetics/Genomics component of the Tufts Clinical and Translational Science Institute (CTSI) and the Center for Genetic Epidemiology and Modeling (CGEM) of the Tufts Institute for Clinical Research and Health Policy Studies at Tufts Medical Center. He is also adjunct professor of epidemiology at the Harvard School of Public Health and visiting professor of epidemiology and biostatistics at Imperial College London. Dr. Ioannidis is a member of the executive board of the Human Genome Epidemiology Network, and has served as President of the Society for Research Synthesis Methodology, as a member of the editorial board of 27 leading international journals (including PLoS Medicine, Lancet, Annals of Internal Medicine, Journal of the National Cancer Institute, Science Translational Medicine, Molecular and Cellular Proteomics, AIDS, International Journal of Epidemiology, Journal of Clinical Epidemiology, Clinical Trials, Cancer Treatment Reviews, Open Medicine, and PLoS ONE, among others) and as Editor-in-Chief of the European Journal of Clinical Investigation for the period 2010-2014. He has given more than 250 invited and honorary lectures. He is one of the most-cited scientists of his generation worldwide, with indices Hirsch h=97 [h(F/L)=80 when limited to first- and last(senior)-author papers), Hirsch m=5.7, and Schreiber hm=60 per GoogleScholar as of 8/2012 (h=80 per ISI). He has received several awards, including the European Award for Excellence in Clinical Science for 2007, and has been inducted in the Association of American Physicians in 2009 and in the European Academy of Cancer Sciences in 2010. The PLoS Medicine paper on “Why most Published Research Findings are False,” has been the most-accessed article in the history of Public Library of Science. The Atlantic selected Ioannidis as the Brave Thinker scientist for 2010 claiming that he “may be one of the most influential scientists alive.” Charles Jaffe, M.D. Dr. Jaffe is the Chief Executive Officer of HL7 where he serves as the organization's global ambassador, fostering relationships with key industry stakeholders. A 37-year veteran of the healthcare IT industry, Dr. Jaffe was previously the Senior Global Strategist for the Digital Health Group at Intel Corporation, Vice President of Life Sciences at SAIC, and the Director of Medical Informatics at AstraZeneca Pharmaceuticals. He completed his medical training at Johns Hopkins and Duke Universities, and was a post-doctoral fellow at the National Institutes

9

Page 37: Sharing Clinical Research Data:  an IOM Workshop

of Health and at Georgetown University. Formerly, he was President of InforMed, an informatics consultancy for research informatics. Over the course of his career, he has been the principal investigator for more than 200 clinical trials, and has served in various leadership roles in the American Medical Informatics Association. He has been a board member on leading organizations for information technology standards, and served as the chair of a national institutional review board. Most recently, he held an appointment in the Department of Engineering at Penn State University. Dr. Jaffe has been the contributing editor for several journals and has published on a range of subjects, including clinical management, informatics deployment, and healthcare policy. Sachin H. Jain, M.D., M.B.A.

Dr. Jain is Chief Medical Information and Innovation Officer (CMIO) at Merck and Lecturer in Health Care Policy at Harvard Medical School. He also serves as an attending hospitalist physician at the Boston VA-Boston Medical Center. In his capacity as Merck's CMIO, Jain is working to establish and manage global alliances to use health data to improve medication safety, efficacy, use, and adherence. In addition, Jain is leading several efforts around new ventures and business development. Previously, he was Senior Advisor to the Administrator of the Centers for Medicare and Medicaid Services (CMS). At CMS, he helped lead the launch of the Center for Medicare and Medicaid Innovation, briefly serving as its Acting Deputy Director for Policy and Programs. At CMS, Jain had advocated for speedier translation of health care delivery research into practice and an expanded use of clinical registries. Dr. Jain also served as Special Assistant to the National Coordinator for Health Information Technology at the Office of the National Coordinator for Health Information Technology (ONC). At the ONC, Jain worked with David Blumenthal to implement the HITECH Provisions of the Recovery Act that provide incentives for physicians and hospitals to become meaningful users of health information technology. He received his undergraduate degree magna cum laude in government from Harvard College; his medical degree from Harvard Medical School; and his master's degree in business administration from the Harvard Business School. He was a recipient of the Paul and Daisy Soros Fellowship and the Dean's Award at Harvard Business School. He completed his residency in internal medicine at Brigham and Women's Hospital, where he was honored with the Resident Mentor Award. Dr. Jain is a founder of several non-profit health care ventures including the Homeless Health Clinic at the Harvard Square Homeless Shelter; the Harvard Bone Marrow Initiative; and ImproveHealthCare.org. He worked with DaVita-Bridge of Life to bring charity dialysis care to rural Rajasthan, India and Medical Missions for Children to bring cleft lip and palate surgery to that region. He previously maintained a faculty appointment at Harvard Business School's Institute for Strategy and Competitiveness and worked with strategy professor Michael Porter on a new case literature on health care delivery innovation. He presently serves as an honorary Senior Institute Associate at Harvard Business School. Dr. Jain has worked previously at WellPoint, McKinsey & Co, and the Institute for Healthcare Improvement. He has also served as an expert consultant to the World Health Organization.

10

Page 38: Sharing Clinical Research Data:  an IOM Workshop

Forum on Drug Discovery, Development, and Translation Forum on Neuroscience and Nervous System Disorders

Roundtable on Translating Genomic-Based Research for Health National Cancer Policy Forum

He has authored over 50 publications on health care delivery innovation and health care reform in journals such as the New England Journal of Medicine, JAMA, and HealthAffairs. His work has been cited in the New York Times, CNN, the Wall Street Journal, and other media outlets. He serves on the editorial boards of the American Journal of Managed Care; Harvard Medicine; and Wing of Zock. The book he co-edited with Susan Pories and Gordon Harper, "The Soul of a Doctor" has been translated into Chinese. Beth Kozel, M.D., Ph.D. Dr. Kozel is an Instructor of Pediatrics in the Division of Genetics and Genomic Medicine at St. Louis Children’s Hospital and Washington University School of Medicine. Clinically, she cares for children with genetic conditions including chromosomal changes, inherited disorders and multiple medical problems of unknown etiology. In her research, she has studied the biology of elastic fiber formation and works with patients with elastic fiber diseases such as Williams-Beuren syndrome, supravalvular aortic stenosis and cutis laxa. Her current research is aimed at identifying modifiers of vascular disease severity among affected individuals. Her work is supported by a K08 award from the NIH and she is a scholar of the Children’s Discovery Institute at Washington University. She is a member of the American Society of Matrix Biology and the American Society of Human Genetics. She is a registered researcher with the Williams Syndrome Patient and Clinical Research (WSPCR) Registry. Harlan Krumholz, M.D. Dr. Krumholz is the Harold H. Hines, Jr. Professor of Medicine and Director of the Robert Wood Johnson Clinical Scholars Program at Yale University School of Medicine and Director of the Yale-New Haven Hospital Center for Outcomes Research and Evaluation (CORE). His research is focused on determining optimal clinical strategies and identifying opportunities for improving the prevention, treatment and outcome of cardiovascular disease. Using applied clinical research methods, his work seeks to provide critical information to improve the quality of health care, monitor changes over time, guide decisions about the allocation of scarce resources and inform decisions made by patients and their clinicians. Studies from his research group have directly led to improvements in the use of guideline-based medications, the timeliness of care for acute myocardial infarction, the information available to patients who are making decisions about clinical options, the public reporting of outcomes measures, and the current national focus on reducing the risk of readmission. He has also advanced the application of quality improvement to the elimination of disparities. His work influenced several provisions in the health reform bill. Dr. Krumholz serves on the Board of Trustees of the American College of Cardiology, the Board of Directors of the American Board of Internal Medicine and the Board of Governors of the Patient-Centered Outcomes Research Institute. He is an elected member of the Association of American Physicians, the American Society for Clinical Investigation, and the Institute of Medicine. He is a Distinguished Scientist of the American Heart Association and has also received the Distinguished Service Award from the organization. He received a B.S. from Yale, an M.D. from Harvard Medical School and a Masters in Health Policy and Management from the Harvard University School of Public Health.

11

Page 39: Sharing Clinical Research Data:  an IOM Workshop

Richard E. Kuntz, M.D., M.Sc. Dr. Kuntz is Senior Vice President and Chief Scientific, Clinical and Regulatory Officer of Medtronic, Inc. In this role, which he assumed in August 2009, Kuntz oversees the company’s global regulatory affairs, health policy and reimbursement, clinical research, ventures and new therapies and innovation functions. Dr. Kuntz joined Medtronic in October 2005, as Senior Vice President and President of Medtronic Neuromodulation, which encompasses the company’s products and therapies used in the treatment of chronic pain, movement disorders, spasticity, overactive bladder and urinary retention, benign prostatic hyperplasia, and gastroparesis. In this role he was responsible for the research, development, operations and product sales and marketing for each of these therapeutic areas worldwide. Dr. Kuntz brings to Medtronic a broad background and expertise in many different areas of healthcare. Prior to Medtronic he was the Founder and Chief Scientific Officer of the Harvard Clinical Research Institute (HCRI), a university-based contract research organization which coordinates National Institutes of Health (NIH) and industry clinical trials with the United States Food and Drug Administration (FDA). He has directed over 100 multicenter clinical trials and has authored more than 200 original publications. His major interests are traditional and alternative clinical trial design and biostatistics. He also served as Associate Professor of Medicine at Harvard Medical School, Chief of the Division of Clinical Biometrics, and an interventional cardiologist in the division of cardiovascular diseases at the Brigham and Women’s Hospital in Boston, MA. Dr. Kuntz graduated from Miami University, and received his medical degree from Case Western Reserve University School of Medicine. He completed his residency in internal medicine at the University of Texas Southwestern Medical School, and then completed fellowships in cardiovascular diseases and interventional cardiology at the Beth Israel Hospital and Harvard Medical School, Boston. Dr. Kuntz received his master’s of science in biostatistics from the Harvard School of Public Health. Rebecca D. Kush, Ph.D. Dr. Kush is Founder, President and CEO of the Clinical Data Interchange Standards Consortium (CDISC), a non-profit standards developing organization (SDO) with a mission to develop and support global, platform-independent standards that enable information system interoperability to improve medical research and related areas of healthcare and a vision of “Informing patient care and safety through higher quality medical research.” She has over 25 years of experience in the area of clinical research, including positions with the U.S. National Institutes of Health, academia, a global CRO and biopharmaceutical companies in the U.S. and Japan. She earned her doctorate in Physiology and Pharmacology from the University of California San Diego (UCSD) School of Medicine. She is lead author on the book: eClinical Trials: Planning and Implementation and has authored numerous publications for journals, including New England Journal of Medicine and Science Translational Medicine. She has developed a Prescription Education Program for elementary and middle schools and was named in PharmaVoice in 2008 as one of the 100 most inspiring individuals in the life-sciences industry. Dr. Kush has served on the Board of Directors for the U.S. Health Information Technology Standards Panel (HITSP), Drug Information Association (DIA) and currently Health Level 7

12

Page 40: Sharing Clinical Research Data:  an IOM Workshop

Forum on Drug Discovery, Development, and Translation Forum on Neuroscience and Nervous System Disorders

Roundtable on Translating Genomic-Based Research for Health National Cancer Policy Forum

(HL7), and she was a member of the advisory committee for the WHO International Clinical Trials Registry Platform. Dr. Kush served on the appointed Planning Committee for the HHS/ONC-sponsored Workshop Series on the “Digital Infrastructure for the Learning Health System” for the National Academy of Sciences Institute of Medicine (IOM). She is a member of the National Cancer Advisory Board IT Workgroup and was invited to represent research as an appointed member of the U.S. Health Information Technology (HIT) Standards Committee. Dr. Kush has developed a course “A Global Approach to Accelerating Medical Research” and has been a keynote speaker at numerous conferences in this arena in Europe, U.S., Brazil, Japan, China, Korea and Australia. Elizabeth Loder, M.D., M.P.H. Dr. Loder is a senior research editor at BMJ. She is also chief of the Division of Headache and Pain in the Department of Neurology at Brigham and Women’s Hospital, Boston, and an Associate Professor of Neurology at Harvard Medical School. She is board-certified in Internal Medicine and Headache Medicine. Dr. Loder is the current President of the American Headache Society and has held leadership positions in national and international pain organizations and served as a member of the recent Institute of Medicine Committee on Advancing Pain Care, Research, and Education. Dr. Loder has worked as a clinician and researcher in the headache field for many years, and has extensive experience in the design and conduct of clinical trials. As a result of these experiences Dr. Loder is deeply interested in problems associated with clinical trial reporting and interpretation, especially the consequences of missing clinical trial information for guidelines and treatment decisions. She has devoted much of her recent career to the development, implementation and dissemination of standards for research reporting. She was responsible for a recent BMJ theme issue on unpublished clinical trial data. Research published in that issue of the journal has led several US lawmakers to question whether the NIH and FDA are doing enough to enforce research reporting requirements.

Deven McGraw, J.D. Ms. McGraw is the Director of the Health Privacy Project at CDT. The Project is focused on developing and promoting workable privacy and security protections for electronic personal health information. Ms. McGraw is active in efforts to advance the adoption and implementation of health information technology and electronic health information exchange to improve health care. She was one of three persons appointed by Kathleen Sebelius, the Secretary of the U.S. Department of Health & Human Services (HHS), to serve on the Health Information Technology (HIT) Policy Committee, a federal advisory committee established in the American Recovery and Reinvestment Act of 2009. She co-chairs the Committee’s Privacy and Security “Tiger Team” and serves as a member of its Meaningful Use, Information Exchange, and Strategic Plan Workgroups. She also served on two key workgroups of the American Health Information Community (AHIC), the federal advisory body established by HHS in the Bush Administration to develop recommendations on how to facilitate use of health information technology to improve health. Specifically, she co-chaired the Confidentiality, Privacy and Security Workgroup and was a member of the Personalized Health Care Workgroup. She also served on the Policy Steering Committee of the eHealth Initiative and now serves on its Leadership Council. She is also on the Steering Group of the Markle Foundation’s Connecting for Health multi-stakeholder initiative. Ms. McGraw has a strong background in health care policy. Prior to joining CDT, Ms. McGraw was the Chief Operating Officer of the National Partnership for Women & Families, providing strategic direction and oversight for all of the organization’s core program areas, including the

13

Page 41: Sharing Clinical Research Data:  an IOM Workshop

promotion of initiatives to improve health care quality. Ms. McGraw also was an associate in the public policy group at Patton Boggs, LLP and in the health care group at Ropes & Gray. She also served as Deputy Legal Counsel to the Governor of Massachusetts and taught in the Federal Legislation Clinic at the Georgetown University Law Center. Ms. McGraw graduated magna cum laude from the University of Maryland. She earned her J.D., magna cum laude, and her L.L.M. from Georgetown University Law Center and was Executive Editor of the Georgetown Law Journal. She also has a Master of Public Health from Johns Hopkins School of Hygiene and Public Health.

Meredith Nahm, Ph.D. Dr. Nahm provides oversight and coordination of clinical research informatics projects undertaken by the Core, including DTMI's provision of infrastructure for clinical research data collection and management. She also oversees informatics for the MURDOCK community registry and the Duke Clinical Research Unit, as well as involvement in national efforts to develop and implement data standards. Additionally, Dr. Nahm supports efforts to advance the application of informatics capabilities to enhance medical practice. Through her work and associated collaborations, Dr. Nahm helps to shape Duke's Clinical Research Informatics strategy and direction.

Prior to her appointment as the Associate Director for Clinical Research Informatics, Dr. Nahm served as the Director of Clinical Data Integration at the Duke Clinical Research Institute 2001 and has been with the DCRI since 1998. She has over fifteen years of experience in clinical research informatics and quality control.

Dr. Nahm has held numerous industry leadership roles, including serving as a member of the Board of Trustees of the Society for Clinical Data Management (SCDM), Chair of the SCDM Good Clinical Data Management Practices (GCDMP) Committee, Chair of the Clinical Data Interchange Standards Consortium (CDISC) Industry Advisory Board, and the data management content expert for the NIH-sponsored National Leadership Forum for Clinical Research Networks. Dr. Nahm authored the GCDMP sections on Measuring and Assuring Data Quality and has played a leadership role in data standards development and implementation efforts. She is currently working with the data standards development efforts in Cardiology and Tuberculosis on two NIH Roadmap projects, and the Data and Statistical Center for the National Institute on Drug Abuse Clinical Trials Network. She is a frequent presenter at industry meetings, and publishes in the clinical trial operational literature. Her research interests include quantitative evaluation of common clinical data management practices and data quality. She received her Master’s in Nuclear Engineering from North Carolina State University, and her PhD at the University of Texas at Houston School of Health Information Sciences. Jeffrey S. Nye, M.D., Ph.D. Dr. Nye in his role as the Vice President and Global Head, Neuroscience External Innovation, Janssen Pharmaceutical Companies of Johnson & Johnson, is responsible for the strategy and implementation of external R&D in the neurosciences, aiming to expand the pipeline with novel business relationships, venture, academic and public-private partnerships and risk-shared outsourcing. Previously, Dr. Nye was Chief Medical Officer and co-head of the East Coast Research and Early Development unit at J&J pharma. Prior roles include VP of Experimental Medicine, VP of compound development (CDTL) for Galantamine (Razadyne/Reminyl), a

14

Page 42: Sharing Clinical Research Data:  an IOM Workshop

Forum on Drug Discovery, Development, and Translation Forum on Neuroscience and Nervous System Disorders

Roundtable on Translating Genomic-Based Research for Health National Cancer Policy Forum

therapy for Alzheimer’s disease, and for Topamax, an epilepsy and migraine medicine. He previously served as a Director of CNS Discovery Genomics and Biotechnology at Pharmacia. Prior to joining the pharmaceutical industry, Dr. Nye was a tenured associate professor of Molecular Pharmacology, Biological Chemistry and Pediatrics (Neurology) at Northwestern University and Children’s Memorial Hospital. He received his bachelor’s degree in biochemistry and master’s degree in pharmacology from Harvard, and his MD and PhD (with Solomon Snyder) from the Johns Hopkins University School of Medicine. He served as a pediatrics resident, postdoctoral fellow (with Richard Axel) and assistant professor of Pediatrics in Neurology at the College of Physicians and Surgeons of Columbia University. Dr. Nye has published over 60 peer reviewed papers on topics spanning basic and clinical research, and has led pharma programs in discovery through clinical development. He has served on numerous study sections and advisory panels, and currently serves as a councilor of the British Association of Psychopharmacology, and is the co-chair of the New York Academy Alzheimer’s Leadership council. Sally Okun, R.N., M.M.H.S. Ms. Okun is a member of the Research, Data and Analytics Team at PatientLikeMe, Inc. As the Head of Health Data Integrity and Patient Safety she is responsible for the site’s medical ontology and the integrity of patient reported health data. She also developed and oversees the PatientsLikeMe Drug Safety and Pharmacovigilance Platform. Prior to joining PatientsLikeMe, Ms. Okun practiced as a palliative care specialist. In addition to working with patients and families she was an independent consultant supporting multi-year clinical, research, and education projects focused on palliative and end of life care for numerous clients including Brown University, Harvard Medical School, MA Department of Mental Health and the Robert Wood Johnson Foundation. Ms. Okun is a member of the Institute of Medicine’s (IOM) Clinical Effectiveness Research Innovation Collaborative, the Evidence Communication Innovation Collaborative and the Best Practices Innovation Collaborative. She is a contributing author on two IOM discussion papers: Principles and Values for Team-based Healthcare and Communicating Evidence in Health Care: Engaging Patients for Improved Health Care Decisions. Ms. Okun received her Master's degree from The Heller School for Social Policy & Management at Brandeis University; completed study of Palliative Care and Ethics at Memorial Sloan-Kettering Cancer Center; and was a National Library of Medicine Fellow in Biomedical Informatics. Eric D. Perakslis, Ph.D. Dr. Perakslis is currently Chief Information Officer and Chief Scientist (Informatics) at the U.S. Food and Drug Administration. In this new role, Eric is responsible for modernizing and enhancing the IT capabilities as well as the in silico scientific capabilities at FDA. Prior to FDA, Dr. Perakslis was Senior Vice President of R&D Information Technology at Johnson & Johnson Pharmaceuticals R&D and was a member of the Corporate Office of Science and Technology. During his thirteen years at J&J, Eric also held the posts of Vice President R&D Informatics, Vice President and Chief Information Officer, Director of Research Information Technology as well as assistant Director and Director of Drug Discovery Research

15

Page 43: Sharing Clinical Research Data:  an IOM Workshop

prior to his current role. Before joining J&J, he was the Group leader of Scientific Computing at ArQule Inc. and he began his professional career with the Army Corps of Engineers. Dr. Perakslis has a Ph.D. in chemical and biochemical engineering from Drexel University and also holds B.S.Che and M.S. degrees in chemical engineering. His current research interests are enterprise knowledge management, patient stratification, healthcare IT and translational informatics with the specific focus precompetitive data sharing, and open source systems globalization. Dr. Perakslis is a late-stage kidney cancer survivor and an avid patient advocate. He has served as the Chairman of the Survivor Advisory Board at the Cancer Institute of New Jersey and as the Chief Information Officer of the King Hussein Institute for Biotechnology and Cancer in Amman, Jordan. Eric has also worked extensively with the Lance Armstrong Foundation, the Kidney Cancer Association, the Scientist ↔ Survivor program of the American Association for Cancer Research, OneMind4Research and several other top non-profit disease-based organizations to further their domestic and international agendas. Richard Platt, M.D., M.Sc. Dr. Platt is a Professor and Chair of the Department of Population Medicine, and Executive Director of the Harvard Pilgrim Health Care Institute. He is an internist trained in infectious diseases and epidemiology. His research focuses on developing multi-institution automated record linkage systems for use in pharmacoepidemiology, and for population based surveillance, reporting, and control of both hospital and community acquired infections. He is principal investigator of the FDA's Mini-Sentinel program, of a CDC Center of Excellence in Public Health Informatics, of the AHRQ HMO Research Network DEcIDE center, and of a CDC Prevention Epicenter. He is a member of the Institute of Medicine Roundtable on Value & Science-Driven Health Care, and of the Association of American Medical Colleges’ Advisory Panel on Research. Dr. Platt formerly chaired the FDA Drug Safety and Risk Management Advisory Committee, an NIH study section (Epidemiology and Disease Control 2), the CDC Office of Health Care Partnerships steering committee, and co-chaired the Board of Scientific Counselors of the CDC National Center for Infectious Diseases. William Z. Potter, M.D., Ph.D. Dr. Potter earned his B.A., M.S., M.D., and Ph.D. at Indiana University, after which he functioned in positions of increasing responsibility and seniority over the next twenty-five years at the National Institutes of Health (NIH) focused on translational neuroscience. While at the NIH, Dr. Potter was widely published and appointed to many societies, committees, and boards; a role which enabled him to develop a wide reputation as an expert in psychopharmacological sciences and championing the development of novel treatments for CNS disorders. Dr. Potter left the NIH in 1996 to accept a position as Executive Director and Research Fellow at Lilly Research Labs, specializing in the Neuroscience Therapeutic Area and in 2004 joined Merck Research Labs (MRL) as VP of Clinical Neuroscience, then the newly created position of Translational Neuroscience in 2006, a position from which he retired in January of this year. His experience at Lilly and MRL in identifying, expanding and developing methods of evaluating CNS effects of compounds in human brain cover state of the art approaches across multiple modalities. These include brain imaging and cerebrospinal fluid proteomics (plus metabolomics) as well as development of more sensitive clinical, psychophysiological and performance

16

Page 44: Sharing Clinical Research Data:  an IOM Workshop

Forum on Drug Discovery, Development, and Translation Forum on Neuroscience and Nervous System Disorders

Roundtable on Translating Genomic-Based Research for Health National Cancer Policy Forum

measures allowing a range of novel targets to be tested in a manner which actually addresses the underlying hypotheses. Bill has become a widely recognized champion for the position that more disciplined hypothesis testing of targets in humans is the best near term approach to moving CNS drug development forward for important neurologic and psychiatric illnesses.

Jonathan Rabinowitz, Ph.D. Prof. Rabinowitz is The Elie Wiesel Professor and a former Department Chairman at Bar Ilan University. He also holds a visiting professorship in the Department of Psychiatry at Mount Sinai School of Medicine in New York. His current research focuses on understanding and minimizing placebo response in antipsychotic and antidepressant trials and developing statistical and methodological ways of improving efficiency of CNS drug trials. His research has been supported by NIH, BMBF, Innovative Medicines Initiative, European Commission European Union, NARSAD and major pharmaceutical companies. He serves as a member of the editorial board of European Neuropsychopharmacology and is a member of the International Biometric Society, International Society for Schizophrenia Research and the European College of Neuropsychopharmacology. He serves on the Clinical Expert Review Committee for the FDA (Food and Drug Administration) data standards development initiative that will standardize clinical data elements for regulatory submission. He is a member of the DIA (Drug Information Association) scientific working group on missing data in clinical trials. He has published over 150 scientific papers. Frank W. Rockhold, Ph.D. Dr. Rockhold is currently Senior Vice President, Global Clinical Safety and Pharmacovigilance at GlaxoSmithKline Pharmaceuticals Research and Development. This includes case management, signal detection, safety evaluation, risk management for all development and marketed products. He is also co-chair of the GSK Global Safety Board. In his 20 years at GSK he has also held management positions within the Statistics & Epidemiology Department and Clinical Operations both in R&D and in the U.S. Pharmaceutical Business, and most recently as interim head of the Cardiovascular and Metabolic Medicines Development Center. Frank has previously held positions of Research Statistician, Lilly Research Laboratories (1979-1987) and Executive Director of Biostatistics, Data Management, and Health Economics, Merck Research Laboratories, (1994-1997).

He has BA in Statistics, from the University of Connecticut, Sc.M. in Biostatistics, from Johns Hopkins University, and a Ph.D. in Biostatistics, from the Medical College of Virginia.

He has held several academic appointments in his career at Butler University, Indiana University, Adjunct Professor of Health Evaluation Sciences, Penn State University and Adjunct Scholar in the Department of Epidemiology and Biostatistics at the University of Pennsylvania. Frank is currently Affiliate Professor of Biostatistics at the Virginia Commonwealth University Medical Center.

Frank is currently Past Chairman of the Board of Directors of the Clinical Data Interchange Standards Consortium (CDISC) and a member of the National Library of Medicine Advisory group for the ClinicalTrials.Gov. He is Past-President, Society for Clinical Trials, Past Chair, PhRMA Biostatistics Steering Committee, and a member of the ICH E-9 and E-10 Expert Working Groups and previously served as Associate Editor for Controlled Clinical Trials.

He is a Fellow of the American Statistical Association and a Fellow of the Society for Clinical Trials. Frank is also a recipient of the PhRMA Career Achievement award. He has over 150 publications and abstracts.

17

Page 45: Sharing Clinical Research Data:  an IOM Workshop

Laura Lyman Rodriguez, Ph.D. Dr. Rodriguez is the Director of the Office of Policy, Communications, and Education at the National Human Genome Research Institute (NHGRI), National Institutes of Health (NIH). She works to develop and implement policy for research initiatives at the NHGRI, design communication and outreach strategies to engage the public in genomic science, and prepare health care professionals for the integration of genomic medicine into clinical care. Laura is particularly interested in the policy and ethics questions related to the inclusion of human research participants in genomics and genetics research and sharing human genomic data through broadly used research resources (e.g., databases). Among other activities, Dr. Rodriguez has provided leadership for many of the policy development activities pertaining to genomic data sharing and the creation of the database for Genotypes and Phenotypes (dbGaP) at the NIH. Dr. Rodriguez received her bachelor of science with honors in biology from Washington and Lee University in Virginia and earned a doctorate in cell biology from Baylor College of Medicine in Texas. Michael Rosenblatt, M.D. Dr. Rosenblatt is Executive Vice President and Chief Medical Officer of Merck & Co., Inc. He is the first person to serve in this role for Merck. Previously he served as Dean of Tufts University School of Medicine. Prior to that, he held the appointment of George R. Minot Professor of Medicine at Harvard Medical School and Chief of the Division of Bone and Mineral Metabolism Research at Beth Israel Deaconess Medical Center (BIDMC). He served as the President of BIDMC from 1999-2001. Previously, he was the Harvard Faculty Dean and Senior Vice President for Academic Programs at CareGroup and BIDMC and a founder of the Carl J. Shapiro Institute for Education and Research at Harvard Medical School and BIDMC, a joint venture whose mission is to manage the academic enterprise and promote academic innovation. Prior to that, he served as Director of the Harvard-MIT Division of Health Sciences and Technology, during which time he led a medical education organization for M.D., Ph.D., and M.D.-Ph.D. training jointly sponsored by Harvard and MIT. And earlier, he was Senior Vice President for Research at Merck Sharp & Dohme Research Laboratories where he co-led the worldwide development team for alendronate (FOSAMAX), Merck's bisphosphonate for osteoporosis and bone disorders. In addition, he directed drug discovery efforts in molecular biology, bone biology and calcium metabolism, virology, cancer research, lipid metabolism, and cardiovascular research in the United States, Japan, and Italy. In leading most of Merck's international research efforts, he established two major basic research institutes, one in Tsukuba, Japan, and one near Rome, Italy. He also headed Merck Research's worldwide University and Industry Relations Department. He is the recipient of the Fuller Albright Award for his work on parathyroid hormone and the Vincent du Vigneaud Award in peptide chemistry and biology, and the Chairman’s Award from Merck. His research is in the field of hormonal regulation of calcium metabolism, osteoporosis, and cancer metastasis to bone. His major research projects are in the design of peptide hormone antagonists for parathyroid hormone and the tumor-secreted parathyroid hormone-like protein, isolation/characterization of receptors and mapping hormone—receptor interactions, elucidating the mechanisms by which breast cancer “homes” to bone, and osteoporosis and bone biology. He has been an active participant in the biotechnology industry, serving on the board of directors and scientific advisory boards of several biotech companies. He was a scientific founder of ProScript, the company that discovered bortezomib (Velcade), now Millennium

18

Page 46: Sharing Clinical Research Data:  an IOM Workshop

Forum on Drug Discovery, Development, and Translation Forum on Neuroscience and Nervous System Disorders

Roundtable on Translating Genomic-Based Research for Health National Cancer Policy Forum

Pharmaceutical’s drug for multiple myeloma and other malignancies. He was a member of the Board of Scientific Counselors of the National Institute of Diabetes and Digestive and Kidney Diseases of the NIH. He has been elected to the American Society of Clinical Investigation, the Association of American Physicians, to Fellowship in the American Association for the Advancement of Science and the American College of Physicians, and the presidency of the American Society of Bone and Mineral Research. He has testified before a Senate Hearing on U.S. biomedical research priorities in 1997, and in 2011 as a consultant to the U.S. President's Council of Advisors on Science and Technology. From 1981 to 1984, he served as Chief of the Endocrine Unit, Massachusetts General Hospital. He received his undergraduate degree summa cum laude from Columbia and his M.D. magna cum laude from Harvard. His internship, residency, and endocrinology training were all at the Massachusetts General Hospital. Vicki L. Seyfert-Margolis, Ph.D. Dr. Seyfert-Margolis is Senior Advisor for Science Innovation and Policy in the FDA Commissioner’s Office. Dr. Seyfert-Margolis focuses on initiatives in innovation, regulatory science, personalized medicine, scientific computing and health informatics. Previously, she served as Chief Scientific Officer at Immune Tolerance Network (ITN), a non-profit consortium of researchers seeking new treatments for diseases of the immune system. At ITN, she oversaw the development of more than 20 centralized laboratory facilities, and the design and execution of biomarker discovery studies for over 25 Phase II clinical trials. As part of the biomarker efforts she established construction of a primer library of 1,000 genes that may be involved in establishing and maintaining immunologic tolerance and co-discovered genes that may mark kidney transplant tolerance. Dr. Seyfert-Margolis was also an Adjunct Associate Professor with the Department of Medicine at the University of California San Francisco. Prior to academia, she served as Director of the Office of Innovative Scientific Research Technologies at the National Institute of Allergy and Infectious Diseases at NIH, where she worked to integrate emerging technologies into existing immunology and infectious disease programs. Dr. Seyfert-Margolis completed her PhD in immunology at the University of Pennsylvania’s School of Medicine. Mary Ann Slack

Ms. Slack has over 25 years’ background in technology and informatics in both the public and private sectors. Since joining FDA’s Center for Drug Evaluation and Research (CDER) in 2003, Ms Slack has led a number of efforts in implementing informatics solutions to business problems. She serves as an FDA expert on the International Conference on Harmonization’s (ICH) multidisciplinary working groups charged with establishing data exchange standards and structures in support of drug and biologic regulatory review, and is actively engaged in the development of data standards to meet regulatory needs. Ms. Slack currently leads CDER’s data standards development program. Jay M. Tenenbaum, Ph.D. Dr. Tenenbaum is the Founder and Chairman of Cancer Commons, a non-profit, open science community that compiles and continually refines information about cancer subtypes and treatments, based on the literature and actual patient outcomes. Dr. Tenenbaum’s background brings a unique perspective of a world-renowned Internet commerce pioneer and visionary. He was founder and CEO of Enterprise Integration Technologies, the first company to conduct a commercial Internet transaction (1992), secure

19

Page 47: Sharing Clinical Research Data:  an IOM Workshop

Web transaction (1993) and Internet auction (1993). In 1994, he founded CommerceNet to accelerate business use of the Internet. In 1997, he co-founded Veo Systems, the company that pioneered the use of XML for automating business-to-business transactions. Dr. Tenenbaum joined Commerce One in January 1999, when it acquired Veo Systems. As Chief Scientist, he was instrumental in shaping the company's business and technology strategies for the Global Trading Web. Post Commerce One, Dr. Tenenbaum was an officer and director of Webify Solutions, which was sold to IBM in 2006, and Medstory, which was sold to Microsoft in 2007. Dr. Tenenbaum was also the Founder and Chairman of CollabRx, a provider of Web-Based applications and services that help cancer patients and their physicians select optimal treatments and trials, which was acquired by Tegal in 2012. Earlier in his career, Dr. Tenenbaum was a prominent AI researcher and led AI research groups at SRI International and Schlumberger Ltd. Dr. Tenenbaum is a fellow and former board member of the American Association for Artificial Intelligence, and a former consulting professor of Computer Science at Stanford. He currently serves as a director of Efficient Finance, Patients Like Me, and the Public Library of Science, and is a consulting professor of Information Technology at Carnegie Mellon's new West Coast campus. Dr. Tenenbaum holds B.S. and M.S. degrees in Electrical Engineering from MIT, and a Ph.D. from Stanford. Sharon F. Terry, M.A. Ms. Terry is President and CEO of the Genetic Alliance, a network of more than 10,000 organizations, 1,200 of which are disease advocacy organizations. Genetic Alliance improves health through the authentic engagement of communities and individuals. It develops innovative solutions through novel partnerships, connecting consumers to smart services. She is the founding CEO of PXE International, a research advocacy organization for the genetic condition pseudoxanthoma elasticum (PXE). As co-discoverer of the gene associated with PXE, she holds the patent for ABCC6 and has assigned her rights to the foundation. She developed a diagnostic test and is conducting clinical trials. Ms. Terry is also a co-founder of the Genetic Alliance Registry and Biobank. She is the author of more than 90 peer-reviewed articles. In her focus at the forefront of consumer participation in genetics research, services and policy, she serves in a leadership role on many of the major international and national organizations, including the Institute of Medicine Science and Policy Board, the National Coalition for Health Professional Education in Genetics board, the International Rare Disease Research Consortium Interim Executive Committee, and is a member of the IOM Roundtable on Translating Genomic-Based Research for Health. She is on the editorial boards of several journals. She was instrumental in the passage of the Genetic Information Nondiscrimination Act. In 2005, she received an honorary doctorate from Iona College for her work in community engagement; the first Patient Service Award from the UNC Institute for Pharmacogenomics and Individualized Therapy in 2007; the Research!America Distinguished Organization Advocacy Award in 2009; and the Clinical Research Forum and Foundation’s Annual Award for Leadership in Public Advocacy in 2011. She is an Ashoka Fellow. Andrew Vickers, Ph.D. Dr. Vickers is Attending Research Methodologist in the Department of Epidemiology and Biostatistics at Memorial Sloan-Kettering Cancer Center. He obtained a First in Natural Science from the University of Cambridge and has a doctorate in clinical medicine from the University of Oxford.

20

Page 48: Sharing Clinical Research Data:  an IOM Workshop

Forum on Drug Discovery, Development, and Translation Forum on Neuroscience and Nervous System Disorders

Roundtable on Translating Genomic-Based Research for Health National Cancer Policy Forum

Dr. Vickers clinical research falls into three broad areas: randomized trials, surgical outcomes research and molecular marker studies. A particular focus of his work is the detection and initial treatment of prostate cancer. Dr Vickers has analyzed the 'learning curve' for radical prostatectomy, is working on a series of studies demonstrating that a single measure of Prostate Specific Antigen (PSA) taken in middle age can predict prostate cancer up to 25 years subsequently and has developed a statistical model to predict the result. His work on randomized trials focuses on methods for integrating randomized trials into routine surgical practice so as to compare different approaches to surgery. As part of this work, he has pioneered the use of web-interfaces for obtaining quality of life data from patients recovering from radical prostatectomy. Dr. Vickers methodological research centers primarily on novel methods for assessing the clinical value of predictive tools. In particular, he has developed decision-analytic tools that can be directly applied to a data set, without the need for data gathering on patient preferences or utilities. Dr. Vickers has a strong interest in teaching statistics. He is course leader for the MSKCC biostatistics course, teaches on the undergraduate curriculum at Weill Medical College of Cornell University and he writes the statistics column for Medscape. He is author of the introductory statistics textbook “What is a p-value anyway? 34 stories to help you actually understand statistics.” John A. Wagner, M.D., Ph.D. Dr. Wagner received his M.D. from Stanford University School of Medicine and his Ph.D. from the Johns Hopkins University School of Medicine. Postgraduate training included Internal Medicine Internship and Residency, as well as Molecular and Clinical Pharmacology Postdoctoral Fellowships. Currently, Dr. Wagner is Vice President and Head, Early Development Pipeline and Projects; Head, Global Project Management at Merck & Co., Inc. Previous, he was Vice President and Head, Clinical Pharmacology, at Merck & Co., Inc and Acting Modeling and Simulation Integrator, Strategically Integrated Modeling and Simulation. Dr. Wagner is also an Adjunct Assistant Professor, Division of Clinical Pharmacology, Department of Medicine, Jefferson Medical College, Thomas Jefferson University and Visiting Clinical Scientist, Harvard - Massachusetts Institute of Technology Division of Health Sciences and Technology, Center for Experimental Pharmacology and Therapeutics. He is the past chair of the PhRMA Clinical Pharmacology Technical Group, past chair of the adiponectin work group for the Biomarkers Consortium, past committee member of the National Academies Institute of Medicine Committee on Qualification of Biomarkers and Surrogate Endpoints in Chronic Disease, current member of the Board of Directors and current chair of the 2012 Scientific Program Committee for American Society for Clinical Pharmacology and Therapeutics, and current full member of the National Academies Institute of Medicine National Cancer Policy Forum. Over 150 peer-reviewed publications include work in the areas of biomarkers and surrogate endpoints, experimental medicine, M&S, pharmacokinetics, pharmacodynamics, precompetitive collaboration, and drug-drug interactions across a variety of therapeutic areas. John Wilbanks, B.A. Mr. Wilbanks is a Senior Fellow in Entrepreneurship with the Ewing Marion Kauffman Foundation. He works on open content, open data, and open innovation systems. Wilbanks also serves as a Research Fellow at Lybba. He’s worked at Harvard Law School, MIT’s Computer Science and Artificial Intelligence Laboratory, the World Wide Web Consortium, the US House of Representatives, and Creative Commons, as well as starting a bioinformatics company. He

21

Page 49: Sharing Clinical Research Data:  an IOM Workshop

sits on the Board of Directors for Sage Bionetworks and iCommons, and the advisory board for Boundless Learning. Mr. Wilbanks holds a degree in philosophy from Tulane University and also studied modern letters at the University of Paris (La Sorbonne). Janet Woodcock, M.D. Dr. Woodcock is the Director, Center for Drug Evaluation and Research, Food and Drug Administration (FDA). Dr. Woodcock held various leadership positions within the Office of the Commissioner, FDA including Deputy Commissioner and Chief Medical Officer, Deputy Commissioner for Operations and Chief Operating Officer and Director, Critical Path Programs. Previously, Dr. Woodcock served as Director, Center for Drug Evaluation and Research from 1994-2005. She also held other positions at FDA including Director, Office of Therapeutics Research and Review and Acting Deputy Director, Center for Biologics Evaluation and Research. A prominent FDA scientist and executive, Dr. Woodcock has received numerous awards, including a Presidential Rank Meritorious Executive Award, the American Medical Association's Nathan Davis Award, and Special Citations from FDA Commissioners. Dr. Woodcock received her M.D. from Northwestern Medical School, completed further training and held teaching appointments at the Pennsylvania State University and the University of California in San Francisco. She joined FDA in 1986. Deborah A. Zarin, M.D. Dr. Zarin is the Director, ClinicalTrials.gov, and Assistant Director for Clinical Research Projects at the Lister Hill National Center for Biomedical Communications in the National Library of Medicine. In this capacity, Dr. Zarin oversees the development and operation of an international registry of clinical trials. Previous positions held by Dr. Zarin include the Director, Technology Assessment Program, at the Agency for Healthcare Research and Quality, and the Director of the Practice Guidelines program at the American Psychiatric Association. In these positions, Dr. Zarin conducted systematic reviews and related analyses in order to develop clinical and policy recommendations. Dr. Zarin’s academic interests are in the area of evidence based clinical and policy decision making, and she is the author of over 70 peer reviewed articles. Dr. Zarin graduated from Stanford University and received her doctorate in medicine from Harvard Medical School. She completed a clinical decision making fellowship, and is board certified in general psychiatry, as well as in child and adolescent psychiatry.

22

Page 50: Sharing Clinical Research Data:  an IOM Workshop

BACKGROUND

ARTICLES

Page 51: Sharing Clinical Research Data:  an IOM Workshop

visualizations. Research scientists need to workmore closely with their computing colleagues tomake sure that these needs are met and that thedevelopment of new analytic methods tied to sci-entific, as opposed to business, analysis is pursued.

Finally, we must work together to explorenew ways of scaling easy-to-generate visualiza-tions to the data-intensive needs of current scien-tific pursuits. (Figure 2 shows the use of standardprojection techniques as a low-cost alternative toexpensive high-end technologies.) Although thereare scientific problems that do call for specializedvisualizers, many do not. By bringing visualiza-tion into everyday use in our laboratories, we canbetter understand the requirements for the designof new tool kits, and we can learn to share andmaintain visualization workflows and products theway we share other scientific systems. A side effectmay well be the lowering of barriers (such as costsand accessibility) to more sophisticated visualiza-tion of increasingly larger data sets, a crucial func-tionality for today’s data-intensive scientist.

References and Notes1. National Center for Atmospheric Research (NCAR),

University Corporation for Atmospheric Research (UCAR),Towards a Robust, Agile, and Comprehensive InformationInfrastructure for the Geosciences: A Strategic Plan forHigh-Performance Simulation (NCAR, UCAR, Boulder, CO,2000); www.ncar.ucar.edu/Director/plan.pdf.

2. T. Hey, S. Tansley, K. Tolle, Eds. The Fourth Paradigm:Data-Intensive Scientific Discovery (Microsoft ExternalResearch, Redmond, WA, 2009).

3. We use the term “immersive environments” to includea number of high-end technologies for the visualization ofcomplex, large-scale data. The term “Cave Automatic VirtualEnvironment (CAVE)” is often used for these environmentsbased on the work of Cruz-Neira et al. (13). A summary ofexisting projects and technologies can be found at http://en.wikipedia.org/wiki/Cave_Automatic_Virtual_Environment.

4. 2010 Web usage estimate from Internet World Stats,www.internetworldstats.com/.

5. This general term accounts for a number of differentresearch approaches; the Web site http://nosql-database.org/ maintains links to ongoing blogs and discussions onthis topic.

6. R. Magoulas, B. Lorica, Introduction to Big Data, Release2.0, issue 11 (O’Reilly Media, Sebastopol, CA, 2009);http://radar.oreilly.com/2009/03/big-data-technologies-report.html.

7. C. Bizer, T. Heath, K. Idehen, T. Berners-Lee, “Linked dataon the Web (LDOW2008),” in Proceedings of the 17thInternational Conference on World Wide Web, Beijing, 21to 25 April 2008.

8. G. Williams, J. Weaver, M. Atre, J. Hendler,J. Web Semant. 8, 365 (2010).

9. R. Lengler, M. J. Epler, A Periodic Table of VisualizationMethods, available at www.visual-literacy.org/periodic_table/periodic_table.html.

10. We note that this is also a good example of ourgeneral point about the need to integratevisualization into the exploration phase of science;identifying this problem earlier would haveenabled substantially higher-quality data collectionover the period shown.

11. S. K. Card, J. D. Mackinlay, B. Shneiderman, Readingsin Information Visualization: Using Vision to Think(Morgan Kaufmann, San Francisco, 1999).

12. A. Perer, B. Shneiderman, “Integrating statistics andvisualization: Case studies of gaining clarity duringexploratory data analysis,” ACM Conference on HumanFactors in Computing Systems (CHI 2008), Florence, Italy,5 to 10 April 2008.

13. C. Cruz-Neira, D. J. Sandin, T. A. DeFanti, R. V. Kenyon,J. C. Hart, Commun. ACM 35, 64 (1992).

10.1126/science.1197654

PERSPECTIVE

Challenges and Opportunitiesin Mining Neuroscience DataHuda Akil,1* Maryann E. Martone,2 David C. Van Essen3

Understanding the brain requires a broad range of approaches and methods from the domains ofbiology, psychology, chemistry, physics, and mathematics. The fundamental challenge is to decipherthe “neural choreography” associated with complex behaviors and functions, including thoughts,memories, actions, and emotions. This demands the acquisition and integration of vast amounts ofdata of many types, at multiple scales in time and in space. Here we discuss the need forneuroinformatics approaches to accelerate progress, using several illustrative examples. The nascentfield of “connectomics” aims to comprehensively describe neuronal connectivity at either a macroscopiclevel (in long-distance pathways for the entire brain) or a microscopic level (among axons, dendrites,and synapses in a small brain region). The Neuroscience Information Framework (NIF) encompassesall of neuroscience and facilitates the integration of existing knowledge and databases of many types.These examples illustrate the opportunities and challenges of data mining across multiple tiers ofneuroscience information and underscore the need for cultural and infrastructure changes ifneuroinformatics is to fulfill its potential to advance our understanding of the brain.

Deciphering the workings of the brain isthe domain of neuroscience, one of themost dynamic fields of modern biology.

Over the past few decades, our knowledge aboutthe nervous system has advanced at a remarkablepace. These advances are critical for understand-ing the mechanisms underlying the broad rangeof brain functions, from controlling breathing to

forming complex thoughts. They are also essen-tial for uncovering the causes of the vast array ofbrain disorders, whose impact on humanity isstaggering (1). To accelerate progress, it is vital todevelop more powerful methods for capitalizingon the amount and diversity of experimental datagenerated in association with these discoveries.

The human brain contains ~80 billion neuronsthat communicate with each other via specializedconnections or synapses (2). A typical adult brainhas ~150 trillion synapses (3). The point of all thiscommunication is to orchestrate brain activity.Each neuron is a piece of cellular machinery thatrelies on neurochemical and electrophysiologicalmechanisms to integrate complicated inputs andcommunicate information to other neurons. But

no matter how accomplished, a single neuron cannever perceive beauty, feel sadness, or solve amathematical problem. These capabilities emergeonly when networks of neurons work together.Ensembles of brain cells, often quite far-flung, formintegrated neural circuits, and the activity of thenetwork as a whole supports specific brain func-tions such as perception, cognition, or emotions.Moreover, these circuits are not static. Environ-mental events trigger molecular mechanisms ofneuroplasticity that alter the morphology and con-nectivity of brain cells. The strengths and patternof synaptic connectivity encode the “software” ofbrain function. Experience, by inducing changesin that connectivity, can substantially alter thefunction of specific circuits during developmentand throughout the life span.

A grand challenge in neuroscience is to eluci-date brain function in relation to its multiple layersof organization that operate at different spatial andtemporal scales. Central to this effort is tackling“neural choreography”: the integrated functioningof neurons into brain circuits, including their spatialorganization, local, and long-distance connections;their temporal orchestration; and their dynamicfeatures, including interactions with their glial cellpartners. Neural choreography cannot be under-stood via a purely reductionist approach. Rather, itentails the convergent use of analytical and syn-thetic tools to gather, analyze, and mine informa-tion from each level of analysis and capture theemergence of new layers of function (or dysfunc-tion) as we move from studying genes and pro-teins, to cells, circuits, thought, and behavior.

The Need for NeuroinformaticsThe profoundly complex nature of the brain re-quires that neuroscientists use the full spectrumof tools available in modern biology: genetic,

1The Molecular and Behavioral Neuroscience Institute, Uni-versity of Michigan, Ann Arbor, MI, USA. 2National Center forMicroscopy and Imaging Research, Center for Research in Bio-logical Systems, University of California, San Diego, La Jolla, CA,USA. 3Deparment of Anatomy and Neurobiology, WashingtonUniversity School of Medicine, St. Louis, MO 63110, USA.

*To whom correspondence should be addressed. E-mail:[email protected]

11 FEBRUARY 2011 VOL 331 SCIENCE www.sciencemag.org708

on

Feb

ruar

y 13

, 201

2w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from

1

Page 52: Sharing Clinical Research Data:  an IOM Workshop

cellular, anatomical, electrophysiological, behav-ioral, evolutionary, and computational. The ex-perimental methods involve many spatial scales,from electron microscopy (EM) to whole-brainhuman neuroimaging, and time scales rangingfrom microseconds for ion channel gating toyears for longitudinal studies of human devel-opment and aging. An increasing number of in-sights emerge from integration and synthesisacross these spatial and temporal domains. How-ever, such efforts face impediments related to thediversity of scientific subcultures and differing ap-proaches to data acquisition, storage, description,and analysis, and even to the language in whichthey are described. It is often unclear how best tointegrate the linear information of genetic se-quences, the highly visual data of neuroanatomy,the time-dependent data of electrophysiology, andthe more global level of analyzing behavior andclinical syndromes.

The great majority of neuroscientists carry outhighly focused, hypothesis-driven research thatcan be powerfully framed in the context of knowncircuits and functions. Such efforts are com-plemented by a growing number of projects thatprovide large data sets aimed not at testing aspecific hypothesis but instead enabling data-intensive discovery approaches by the communityat large. Notable successes include gene expressionatlases from the Allen Institute for Brain Sciences(4) and the Gene Expression Nervous SystemAtlas (GENSAT) Project (5), and disease-specifichuman neuroimaging repositories (6). However,the neuroscience community is not yet fully en-gaged in exploiting the rich array of data cur-rently available, nor is it adequately poised tocapitalize on the forthcoming data explosion.

Below we highlight several major endeavorsthat provide complementary perspectives on thechallenges and opportunities in neuroscience datamining. One is a set of “connectome” projects thataim to comprehensively describe neural circuitsat either the macroscopic or the microscopic level.Another, the Neuroscience Information Framework(NIF), encompasses all of neuroscience and pro-vides access to existing knowledge and databasesofmany types. These and other efforts provide freshapproaches to the challenge of elucidating neuralchoreography.

Connectomes: Macroscopic and MicroscopicBrain anatomy provides a fundamental three-dimensional framework around which manytypes of neuroscience data can be organized andmined. Decades of effort have revealed immenseamounts of information about local and long-distance connections in animal brains. A widerange of tools (such as immunohistochemistryand in situ hybridization) have characterized thebiochemical nature of these circuits that arestudied electrophysiologically, pharmacologically,and behaviorally (7). Several ongoing efforts aimto integrate anatomical information into searchable

resources that provide a backbone for understand-ing circuit biology and function (8–10). The chal-lenge of integrating such data will dramaticallyincrease with the advent of high-throughput ana-tomical methods, including those emerging fromthe nascent field of connectomics.

A connectome is a comprehensive descriptionof neural connectivity for a specified brain regionat a specified spatial scale (11, 12). Connectomicscurrently includes distinct subdomains for studyingthe macroconnectome (long-distance pathwayslinking patches of gray matter) and the micro-connectome (complete connectivity within a singlegray-matter patch).

TheHumanConnectome Project.Until recent-ly, methods for charting neural circuits in thehuman brain were sorely lacking (13). This sit-uation has changed dramatically with the advent

of noninvasive neuroimaging methods. Two com-plementary modalities of magnetic resonanceimaging (MRI) provide the most useful informa-tion about long-distance connections. One mo-dality uses diffusion imaging to determine theorientation of axonal fiber bundles inwhitematter,based on the preferential diffusion of water mol-ecules parallel to these fiber bundles. Tractographyis an analysis strategy that uses this information toestimate long-distance pathways linking differentgray-matter regions (14, 15). A second modality,resting-state functional MRI (R-fMRI), is basedon slow fluctuations in the standard fMRI BOLDsignal that occur even when people are at rest.The time courses of these fluctuations are corre-lated across gray-matter locations, and the spatialpatterns of the resultant functional connectivitycorrelation maps are closely related but not iden-tical to the known pattern of direct anatomicalconnectivity (16, 17). Diffusion imaging andR-fMRI each have important limitations, but to-gether they offer powerful and complementarywindows on human brain connectivity.

To address these opportunities, the NationalInstitutes of Health (NIH) recently launched theHuman Connectome Project (HCP) and awarded

grants to two consortia (18). The consortium ledby Washington University in St. Louis and theUniversity ofMinnesota (19) aims to characterizewhole-brain circuitry and its variability acrossindividuals in 1200 healthy adults (300 twin pairsand their nontwin siblings). Besides diffusionimaging and R-fMRI, task-based fMRI datawill be acquired in all study participants, alongwith extensive behavioral testing; 100 partic-ipants will also be studied with magnetoenceph-alography (MEG) and electroencephalography(EEG). Acquired blood samples will enable geno-typing or full-genome sequencing of all partici-pants near the end of the 5-year project. Currently,data acquisition and analysis methods are beingextensively refined using pilot data sets. Data ac-quisition from the main cohort will commence inmid-2012.

Neuroimaging and behavioral data from theHCP will be made freely available to the neuro-science community via a database (20) and aplatform for visualization and user-friendly datamining. This informatics effort involves majorchallenges owing to the large amounts of data(expected to be ~1 petabyte), the diversity of datatypes, and the many possible types of data min-ing. Some investigators will drill deeply by ana-lyzing high-resolution connectivitymaps betweenall gray-matter locations. Others will explore amore compact “parcellated connectome” amongall identified cortical and subcortical parcels. Datamining optionswill reveal connectivity differencesbetween subpopulations that are selected by be-havioral phenotype (such as high versus low IQ)and various other characteristics (Fig. 1). The util-ity of HCP-generated data will be enhanced byclose links to other resources containing comple-mentary types of spatially organized data, such asthe Allen Human Brain Atlas (21), which con-tains neural gene expression maps.

Microconnectomes. Recent advances in serialsection EM, high-resolution optical imagingmethods, and sophisticated image segmentationmethods enable detailed reconstructions of the

Fig. 1. Schematic illustration of online data mining capabilities envisioned for the HCP. Investigators willbe able to pose a wide range of queries (such as the connectivity patterns of a particular brain region ofinterest averaged across a group of individuals, based on behavioral criteria) and view the search resultsinteractively on three-dimensional brain models. Data sets of interest will be freely available fordownloading and additional offline analysis.

www.sciencemag.org SCIENCE VOL 331 11 FEBRUARY 2011 709

SPECIALSECTION

on

Feb

ruar

y 13

, 201

2w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from

2

Page 53: Sharing Clinical Research Data:  an IOM Workshop

microscopic connectome at the level of individ-ual synapses, axons, dendrites, and glial processes(22–24). Current efforts focus on the reconstruc-tion of local circuits, such as small patches of thecerebral cortex or retina, in laboratory animals.As such data sets begin to emerge, a fresh set ofinformatics challenges will arise in handling peta-byte amounts of primary and analyzed data and inproviding data mining platforms that enable neuro-scientists to navigate complex local circuits andexamine interesting statistical characteristics.

Micro- and macroconnectomes exemplifydistinct data types within particular tiers of analy-sis that will eventually need to be linked. Effectiveinterpretation of both macro- and microconnec-tomic approaches will require novel informaticsand computational approaches that enable thesetwo types of data to be analyzed in a commonframework and infrastructure. Efforts such as theBlue Brain Project (25) represent an importantinitial thrust in this direction, but the endeavorwill entail decades of effort and innovation.

Powerful and complementary approaches suchas optogenetics operate at an intermediate (meso-connectome) spatial scale by directly perturbingneural circuits in vivo or in vitro with light-activatedion channels inserted into selected neuronal types(26). Other optical methods, such as calcium im-aging with two-photon laser microscopy, enableanalysis of the dynamics of ensembles of neuronsin microcircuits (27, 28) and can lead to new con-ceptualizations of brain function (29). Such ap-proaches provide an especially attractive windowon neural choreography as they assess or perturbthe temporal patterns of macro- or microcircuitactivity.

The NIFConnectome-related projects illustrate ways inwhich neuroscience as a field is evolving at thelevel of neural circuitry. Other discovery efforts in-clude genome-wide gene expression profiling [forexample, (30)] or epigenetic analyses acrossmulti-ple brain regions in normal and diseased brains.This wide range of efforts results in a sharp increasein the amount and diversity of data being generated,making it unlikely that neuroscience will be ade-quately served by only a handful of centralizeddatabases, as is largely the case for the genomicsand proteomics community (31). How, then, canwe access and explore these resources more effec-tively to support the data-intensive discovery en-visioned in The Fourth Paradigm (32)?

Tackling this question was a prime motiva-tion behind the NIF (33). The NIF was launchedin 2005 to survey the current ecosystem of neuro-science resources (databases, tools, and materials)and to establish a resource description frameworkand search strategy for locating, accessing, andusing digital neuroscience-related resources (34).

The NIF catalog, a human-curated registry ofknown resources, currently includes more than3500 such resources, and new ones are added

daily. Over 2000 of these resources are databasesthat range in size from hundreds to millions ofrecords. Many were created at considerable effortand expense, yet most of them remain underusedby the research community.

Clearly, it is inefficient for individual re-searchers to sequentially visit and explore thou-sands of databases, and conventional onlinesearch engines are inadequate, insofar as theydo not effectively in-dex or search databasecontent. To promote thediscovery and use of on-line databases, the NIFcreated a portal throughwhich users can searchnot only the NIF registrybut the content of mul-tiple databases simul-taneously. The currentNIF federation includesmore than 65 databasesaccessing ~30 millionrecords (35) in majordomains of relevanceto neuroscience (Fig. 2).Besides very large ge-nomic collections, thereare nearly 1million anti-body records, 23,000brainconnectivity records,and >50,000 brain activa-tion coordinates. Manyof these areas are cov-ered by multiple data-bases, which the NIFknits together into a co-herent view. Althoughimpressive, this repre-sents only the tip of theiceberg. Most individ-ual databases are under-populated because ofinsufficient communi-ty contributions. Entiredomains of neurosci-ence(suchaselectrophys-iology and behavior)are underrepresented ascompared to genomicsand neuroanatomy.

Ideally, NIF usersshould be able not onlyto locate answers thatare known but to mine available data in waysthat spur new hypotheses regarding what is notknown. Perhaps the single biggest roadblock tothis higher-order data mining is the lack of stan-dardized frameworks for organizing neurosciencedata. Individual investigators often use terminol-ogy or spatial coordinate systems customized fortheir own particular analysis approaches. Thiscustomization is a substantial barrier to data in-

tegration, requiring considerable human effortto access each resource, understand the contextand content of the data, and determine the con-ditions under which they can be compared toother data sets of interest.

To address the terminology problem, the NIFhas assembled an expansive lexicon and ontol-ogy covering the broad domains of neuroscienceby synthesizing open-access community ontologies

(36). The Neurolex andaccompanyingNIFSTD(NIF-standardized) on-tologies provide defi-nitions of over 50,000concepts, using formallanguages to representbrain regions, cells, sub-cellular structures, mol-ecules, diseases, andfunctions, and the rela-tions among them.When users search fora concept through theNIF, it automatically ex-pands the query to in-clude all synonymousor closely related terms.For example, a query for“striatum” will include“neostriatum, dorsal stri-atum, caudoputamen,caudate putamen” andother variants.

Neurolex terms areaccessible through awiki (37) that allowsusers to view, augment,and modify these con-cepts. The goal is to pro-vide clear definitions ofeach concept that canbe used not only by hu-mans but by automatedagents, such as the NIF,to navigate the com-plexities of human neu-roscience knowledge.A key feature is the as-signment of a uniqueresource identifier tomake it easier for searchalgorithms todistinguishamong concepts thatshare the same label.

For example, nucleus (part of cell) and nucleus(part of brain) are distinguished by unique IDs.Using these identifiers in addition to natural lan-guage to reference concepts in databases andpublications, although conceptually simple, is anespecially powerful means for making datamaximally discoverable and useful.

These efforts to develop and deploy a seman-tic framework for neuroscience, spearheaded by

Fig. 2. Current contents of the NIF. The NIF navi-gation bar displays the current contents of the NIFdata federation, organized by data type and levelof the nervous system. The number of records ineach category is displayed in parentheses.

11 FEBRUARY 2011 VOL 331 SCIENCE www.sciencemag.org710

on

Feb

ruar

y 13

, 201

2w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from

3

Page 54: Sharing Clinical Research Data:  an IOM Workshop

the NIF and the International Neuroinformat-ics Coordinating Facility (38), are comple-mented by projects related to brain atlases andspatial frameworks (39–41), providing toolsfor referencing data to a standard coordinatesystem based on the brain anatomy of a givenorganism.

Neuroinformatics as a Prelude toNew DiscoveriesHow might improved access to multiple tiersof neurobiological data help us understand thebrain? Imagine that we are investigating theneurobiology of bipolar disorder, an illness inwhich moods are normal for long periods oftime, yet are labile and sometimes switch tomania or depression without an obvious externaltrigger. Although highly heritable, this diseaseappears to be genetically very complex and pos-sibly quite heterogeneous (42). We may dis-cover numerous genes that impart vulnerabilityto the illness. Some may be ion channels, otherssynaptic proteins or transcription factors. Howwill we uncover how disparate genetic causeslead to a similar clinical phenotype? Are they allaffecting the morphology of certain cells; thedynamics of specific microcircuits, for example,within the amygdala; the orchestration of infor-mation across regions, for example, between theamygdala and the prefrontal cortex? Can wecreate genetic mouse models of the various mu-tated genes and show a convergence at any ofthese levels? Can we capture the critical changesin neuronal and/or glial function (at any of thelevels) and find ways to prevent the illness? Dis-covering the common thread for such a diseasewill surely benefit from tools that facilitate navi-gation across the multiple tiers of data—genetics,gene expression/epigenetics, changes in neuro-nal activity, and differences in dynamics at themicro and macro levels, depending on the moodstate. No single focused level of analysis will suf-fice to achieve a satisfactory understanding ofthe disease. In neural choreography terms, weneed to identify the dancers, define the natureof the dance, and uncover how the diseasedisrupts it.

RecommendationsNeed for a cultural shift. To meet the grand chal-lenge of elucidating neural choreography, weneed increasingly powerful scientific tools tostudy brain activity in space and in time, to ex-tract the key features associated with particularevents, and to do so on a scale that reveals com-monalities and differences between individualbrains. This requires an informatics infrastructurethat has built-in flexibility to incorporate newtypes of data and navigate across tiers and do-mains of knowledge.

The NIF currently provides a platform forintegrating and systematizing existing neuro-science knowledge and has beenworking to define

best practices for those producing new neuro-science data. Good planning and future invest-ment are needed to broaden and harden theoverall framework for housing, analyzing, andintegrating future neuroscience knowledge. TheInternational Neuroinformatics Coordinating Fa-cility (INCF) plays an important role in coordi-nating and promoting this framework at a globallevel.

But can neuroscience evolve so that neuro-informatics becomes integral to howwe study thebrain? This would entail a cultural shift in the fieldregarding the importance of data sharing andmining. It would also require recognition thatneuroscientists produce data not just for con-sumption by readers of the conventional litera-ture, but for automated agents that can find,relate, and begin to interpret data from databasesas well as the literature. Search technologies areadvancing rapidly, but the complexity of scien-tific data continues to challenge. To make neuro-science data maximally interoperable within aglobal neuroscience information framework, weencourage the neuroscience community and theassociated funding agencies to consider the fol-lowing set of general and specific suggestions:

1) Neuroscientists should, as much as is fea-sible, share their data in a form that is machine-accessible, such as through aWeb-based databaseor some other structured form that benefits fromincreasingly powerful search tools.

2) Databases spanning a growing portion ofthe neuroscience realm need to be created,populated, and sustained. This effort needs ade-quate support from federal and other fundingmechanisms.

3) Because databases become more usefulas they are more densely populated (43), add-ing to existing databases may be preferable tocreating customized new ones. The NIF, INCF,and other resources provide valuable tools forfinding existing databases.

4) Data consumption will increasingly involvemachines first and humans second.Whether creat-ing database content or publishing journal arti-cles, neuroscientists should annotate content usingcommunity ontologies and identifiers. Coordi-nates, atlas, and registration method should be spe-cified when referencing spatial locations.

5) Some types of published data (such as braincoordinates in neuroimaging studies) should bereported in standardized table formats that facil-itate data mining.

6) Investment needs to occur in interdis-ciplinary research to develop computational,machine-learning, and visualization methodsfor synthesizing across spatial and temporalinformation tiers.

7) Educational strategies from undergraduatethrough postdoctoral levels are needed to ensurethat neuroscientists of the next generation areproficient in data mining and using the data-sharing tools of the future.

8) Cultural changes are needed to promotewidespread participation in this endeavor. Theseideas are not just a way to be responsible andcollaborative; they may serve a vital role in attain-ing a deeper understanding of brain function anddysfunction.

With such efforts, and some luck, the ma-chinery that we have created, including powerfulcomputers and associated tools, may provide uswith the means to comprehend this “most un-accountable of machinery” (44), our own brain.

References and Notes1. World Health Organization, Mental Health and

Development: Targeting People With Mental HealthConditions As a Vulnerable Group (World HealthOrganization, Geneva, 2010).

2. F. A. Azevedo et al., J. Comp. Neurol. 513, 532(2009).

3. B. Pakkenberg et al., Exp. Gerontol. 38, 95 (2003).4. www.alleninstitute.org/5. www.gensat.org/6. http://adni.loni.ucla.edu7. A. Bjorklund, T. Hokfelt, Eds., Handbook of Chemical

Neuroanatomy Book Series, vols. 1 to 21 (Elsevier,Amsterdam, 1983–2005).

8. http://cocomac.org/home.asp9. http://brancusi.usc.edu/bkms/

10. http://brainmaps.org/11. http://en.wikipedia.org/wiki/Connectome12. O. Sporns, G. Tononi, R. Kotter, PLoS Comput. Biol. 1,

e42 (2005).13. F. Crick, E. Jones, Nature 361, 109 (1993).14. H. Johansen-Berg, T. E. J. Behrens, Diffusion MRI: From

Quantitative Measurement to in-vivo Neuroanatomy(Academic Press, London, ed. 1, 2009).

15. H. Johansen-Berg, M. F. Rushworth, Annu. Rev. Neurosci.32, 75 (2009).

16. J. L. Vincent et al., Nature 447, 83 (2007).17. D. Zhang, A. Z. Snyder, J. S. Shimony, M. D. Fox,

M. E. Raichle, Cereb. Cortex 20, 1187 (2010).18. http://humanconnectome.org/consortia19. http://humanconnectome.org20. D. S. Marcus, T. R. Olsen, M. Ramaratnam, R. L. Buckner,

Neuroinformatics 5, 11 (2007).21. http://human.brain-map.org/22. K. L. Briggman, W. Denk, Curr. Opin. Neurobiol. 16, 562

(2006).23. J. W. Lichtman, J. Livet, J. R. Sanes, Nat. Rev. Neurosci. 9,

417 (2008).24. S. J. Smith, Curr. Opin. Neurobiol. 17, 601

(2007).25. H. Markram, Nat. Rev. Neurosci. 7, 153 (2006).26. K. Deisseroth, Nat. Methods 8, 26 (2011).27. B. F. Grewe, F. Helmchen, Curr. Opin. Neurobiol. 19, 520

(2009).28. B. O. Watson et al., Front. Neurosci. 4, 29 (2010).29. G. Buzsaki, Neuron 68, 362 (2010).30. R. Bernard et al., Mol. Psychiatry, 13 April 2010 (e-pub

ahead of print).31. M. E. Martone, A. Gupta, M. H. Ellisman, Nat. Neurosci.

7, 467 (2004).32. The Fourth Paradigm: Data Intensive Scientific Discovery,

T. Hey, S. Tansler, K. Tolle, Eds. (Microsoft ResearchPublishing, Redmond, WA, 2009).

33. http://neuinfo.org34. D. Gardner et al., Neuroinformatics 6, 149 (2008).35. A. Gupta et al., Neuroinformatics 6, 205 (2008).36. W. J. Bug et al., Neuroinformatics 6, 175 (2008).37. http://neurolex.org38. http://incf.org39. http://www.brain-map.org40. http://wholebraincatalog.org41. http://incf.org/core/programs/atlasing42. H. Akil et al., Science 327, 1580 (2010).

www.sciencemag.org SCIENCE VOL 331 11 FEBRUARY 2011 711

SPECIALSECTION

on

Feb

ruar

y 13

, 201

2w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from

4

Page 55: Sharing Clinical Research Data:  an IOM Workshop

43. G. A. Ascoli, Nat. Rev. Neurosci. 7, 318 (2006).44. N. Nicolson, J. Trautmann, Eds., The Letters of Virginia

Woolf, Volume V: 1932–1935 (Harcourt, Brace,New York, 1982).

45. Funded in part by grants from NIH (5P01-DA021633-02),the Office of Naval Research (ONR-N00014-02-1-0879),

and the Pritzker Neuropsychiatric Research Consortium(to H.A); and the HCP (1U54MH091657-01) from the16 NIH Institutes and Centers that support the NIHBlueprint for Neuroscience Research (to D.C.V.E.).The NIF is supported by NIH Neuroscience Blueprintcontract HHSN271200800035C via the National

Institute on Drug Abuse to M.E.M. We thank D. Marcus,R. Poldrack, S. Curtiss, A. Bandrowski, andS. J. Watson for discussions and comments onthe manuscript.

10.1126/science.1199305

PERSPECTIVE

The Disappearing Third DimensionTimothy Rowe1 and Lawrence R. Frank2

Three-dimensional computing is driving what many would call a revolution in scientificvisualization. However, its power and advancement are held back by the absence of sustainablearchives for raw data and derivative visualizations. Funding agencies, professional societies, andpublishers each have unfulfilled roles in archive design and data management policy.

Three-dimensional (3D) image acquisition,analysis, and visualization are increas-ingly important in nearly every field of

science, medicine, engineering, and even thefine arts. This reflects rapid growth of 3D scan-ning instrumentation, visualization and analysisalgorithms, graphic displays, and graduate train-ing. These capabilities are recognized as criticalfor future advances in science, and the U.S. Na-tional Science Foundation (NSF) is one of manyfunding agencies increasing support for 3D im-aging and computing. For example, one newinitiative aims at digitizing biological researchcollections, and NSF’s Earth Sciences programwill soon announce a new target in cyberinfra-structure, with a spotlight on 3D imaging of nat-ural materials.

Many consider the advent of 3D imaging a“scientific revolution” (1), but in many ways therevolution is still nascent and unfulfilled. Giventhe increasing ease of producing these data andrapidly increased funding for imaging in non-medical applications, a major unmet challenge toensure maximal advancement is archiving andmanaging the science that will be, and even hasalready been, produced. To illustrate the prob-lems, we focus here on one domain, volume ele-ments or “voxels,” the 3D equivalents of pixels,but the argument applies more broadly across 3Dcomputing. Useful, searchable archiving requiresinfrastructure and policy to enable disclosure bydata producers and to guarantee quality to dataconsumers (2). Voxel data have generally lackeda coherent archiving and dissemination policy,and thus the raw data behind thousands ofpublished reports are not released or availablefor validation and reuse. A solution requires new

infrastructure and new policy to manage owner-ship and release (3).

Technological advancements in both med-ical and industrial scanners have increased theresolution of 3D volumetric scanners to thepoint that they can now volumetrically digi-tize structures from the size of a cell to a bluewhale, with exquisite sensitivity toward scien-tific targets ranging from tissue properties to thecomposition of meteorites. Voxel data sets aregenerated by rapidly diversifying instrumentsthat include x-ray computed tomographic (CT)scanners (Fig. 1), magnetic resonance imaging(MRI) scanners (Fig. 2), confocal microscopes,synchrotron light sources, electron-positron scan-ners, and other innovative tools to digitize entireobject volumes.

CT and MRI have evolved across the widestrange of applications. CT is sensitive to density.Its greatest strength is imaging dense materialslike rocks, fossils, the bones in living organismsand, to a lesser extent, soft tissue. The first clin-ical CT scan, made in 1971, used an 80 by 80matrix of 3 mm by 3 mm by 13 mm voxels, eachslice measuring 6.4 Kb and taking up to 20 minto acquire. Each slice image took 7 min to recon-struct on a mainframe computer (4). In 1984, thefirst fossil, a 14-cm-long skull of the extinct mam-mal Stenopsochoerus, was scanned in its entirety(5), signaling CT’s impact beyond the clinical set-ting and in digitizing entire object volumes. Thecomplete data set measured 3.28Mb.With amain-frame computer, rock matrix was removed to vi-sualize underlying bone, and surface modelswere reconstructed in computations taking allnight. By 1992, industrial adaptations broughtCTan order-of-magnitude higher resolution (high-resolution x-ray computed tomography, HRXCT)to inspect much smaller, denser objects. A fossilskull of the stem-mammal Thrinaxodon (68 mmlong) was scanned in 0.2-mm-thick slices measur-ing 119Kb each. Scanning the entire volume took6 hours, and the complete raw data set occupied18.4 Mb (6). The scans revealed all internal de-tails reported earlier, from destructive mechanicalserial sectioning of a different specimen, and pushed

1Jackson School of Geosciences, The University of Texas, Austin,TX 78712, USA. 2Department of Radiology, University of Cal-ifornia, San Diego, CA 92093, USA.

*To whom correspondence should be addressed. E-mail:[email protected] (T.R.); [email protected] (L.R.F.)

Fig. 1. (A) Photomicrograph of fossil tooth (Morganucodon sp.). (B) MicroXCT slice, 3.2-mm voxel size;arrows show ring artifact, a correctable problem with raw data but an interpretive challenging withcompressed data. (C) Digital 3D reconstruction. (D) Slice through main cusp; red arrows show growthbands in dentine, and blue arrows mark the enamel-dentine boundary, both observed in Morganucodonfor the first time in these scans.

11 FEBRUARY 2011 VOL 331 SCIENCE www.sciencemag.org712

on

Feb

ruar

y 13

, 201

2w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from

5

Page 56: Sharing Clinical Research Data:  an IOM Workshop

COMPARATIVE EFFECTIVENESS

Distributed Health Data NetworksA Practical and Preferred Approach to Multi-Institutional Evaluations

of Comparative Effectiveness, Safety, and Quality of Care

Jeffrey S. Brown, PhD,* John H. Holmes, PhD,† Kiran Shah, BA,‡ Ken Hall, MDIV,§Ross Lazarus, MBBS, MPH,* and Richard Platt, MD, MSc*

Background: Comparative effectiveness research, medical productsafety evaluation, and quality measurement will require the ability touse electronic health data held by multiple organizations. There is noconsensus about whether to create regional or national combined(eg, “all payer”) databases for these purposes, or distributed datanetworks that leave most Protected Health Information and propri-etary data in the possession of the original data holders.Objectives: Demonstrate functions of a distributed research net-work that supports research needs and also address data holdersconcerns about participation. Key design functions included stronglocal control of data uses and a centralized web-based queryinginterface.Research Design: We implemented a pilot distributed researchnetwork and evaluated the design considerations, utility for research,and the acceptability to data holders of methods for menu-drivenquerying. We developed and tested a central, web-based interfacewith supporting network software. Specific functions assessed in-clude query formation and distribution, query execution and review,and aggregation of results.Results: This pilot successfully evaluated temporal trends in med-ication use and diagnoses at 5 separate sites, demonstrating some ofthe possibilities of using a distributed research network. The pilotdemonstrated the potential utility of the design, which addressed themajor concerns of both users and data holders. No serious obstacles

were identified that would prevent development of a fully functional,scalable network.Conclusions: Distributed networks are capable of addressing nearlyall anticipated uses of routinely collected electronic healthcare data.Distributed networks would obviate the need for centralized data-bases, thus avoiding numerous obstacles.

Key Words: distributed health data network, all payer databases,comparative effectiveness research network

(Med Care 2010;48: S45–S51)

Electronic health records and other data routinely collectedduring the delivery of, or payment for, health care, and

disease or exposure-specific registries are important resourcesto address the growing need for evidence about the effective-ness, safety, and quality of medical care.1–6 Unfortunately,even very large individual healthcare databases and registriesare not big or diverse enough to address many of needs ofclinicians, health care delivery systems, or the public healthcommunity. The Institute of Medicine and others have artic-ulated the goal of using routinely collected health informationfor these “secondary” purposes.1,7–10 The Federal Coordi-nating Council for Comparative Effectiveness Research(FCCCER) noted the need for studies “with sufficient powerto discern treatment effects and other impacts of interventionsamong patient subgroups.” In their priority recommendations,the FCCCER listed comparative effectiveness research datainfrastructure as a primary investment that cuts-across allother comparative effectiveness needs.5 The FCCCER alsolisted several considerations for investing in person-leveldatabases for comparative effectiveness research, includingthe ability to link to external data sources, the researchreadiness of the databases, and the need to maintain securityand privacy of personally identifiable health information.5

To address the need to evaluate the processes andoutcomes of care of large populations, some propose creationof large, centralized, multipayer claims databases.11 For ex-ample, the Department of Health and Human Services issueda contract titled “Strategic Design for an All-Payor, All-Claims Database to Support Comparative Effectiveness Re-search.” In addition, several states have already implemented

From the *Department of Population Medicine, Harvard Medical School andHarvard Pilgrim Health Care Institute, Boston, MA; †Department ofBiostatistics and Epidemiology, Center for Clinical Epidemiology andBiostatistics, University of Pennsylvania School of Medicine, Philadel-phia, PA; ‡Lincoln Peak Partners, Westborough, MA; and §Deloitte,NCPHI/OD, Atlanta, GA.

Supported by Agency for Healthcare Research and Quality Contract No.290–05–0033, US Department of Health and Human Services as part ofthe Developing Evidence to Inform Decisions about Effectiveness(DEcIDE) program.

The authors of this report are responsible for its content. Statements in thisreport should not be construed as endorsement by the Agency forHealthcare Research and Quality or the US Department of Health andHuman Services.

Reprints: Jeffrey S. Brown, PhD, Department of Population Medicine,Harvard Medical School and Harvard Pilgrim Health Care Institute,133 Brookline Ave, 6th floor, Boston, MA 02215. E-mail:[email protected].

Copyright © 2010 by Lippincott Williams & WilkinsISSN: 0025-7079/10/4800-0045

Medical Care • Volume 48, Number 6 Suppl 1, June 2010 www.lww-medicalcare.com | S45

6

Page 57: Sharing Clinical Research Data:  an IOM Workshop

all-payer claims databases to address cost and quality con-cerns.12,13 This data centralization approach is alluring be-cause, in theory, it mitigates the complications associatedwith conducting research across multiple data holders. Inpractice, a centralized approach raises several serious secu-rity, proprietary, operational, legal, and patient privacy con-cerns for data holders, patients, and funders.14–17 As oneexample, even if a centralized database omits explicit iden-tifying information like name and address, it is effectivelyimpossible to prevent reidentification of individual level lon-gitudinal data that contains enough detail to serve multiplepurposes. In our experience, these limitations have severelyconstrained the effective, coordinated use of data held bymultiple organizations.

An alternative to centralized, all-payer, databases, isone or more distributed research networks that permitcomparative effectiveness and other evaluations acrossmultiple databases without creation of a central data ware-house.14,15,18–24 Several such networks18,22–29 currently con-duct comparative effectiveness and pharmacoepidemiologicresearch using a distributed data approach in which dataholders maintain control over their protected data and itsuses. These networks require data holders to transform theirdata of interest into a common data model that enforcesuniform data element naming conventions, definitions, anddata storage formats. The common data format allows datachecking, manipulation, and analysis via identical computerprograms that are shared by all data holders. Existing distrib-uted networks typically distribute these computer programsvia e-mail, data holders manually execute the programs, andreturn the output via secure e-mail, or another secure mech-anism to a coordinating center for aggregation and, possibly,additional analysis. Many studies require no transfer of pro-tected health information. Several single-study networks con-sisting of 20 to 50 million members also have been developedthat adhere to a distributed research network approach.30,31

Existing and proposed distributed networks have tre-mendous potential utility for addressing our current postmarketing evidence knowledge gap while benefiting from anapproach that is more acceptable to data holders.14,15,21 Inaddition, a distributed approach keeps the data close to thepeople who know the data best and who can best consult onproper use of the data and investigate findings or anomalies.

Obstacles to effective implementation of both central-ized and distributed approaches include differences in com-puting environments and information systems, the need fordata standardization and checking, organization-by-organiza-tion variation in contracting policies and procedures, con-cerns related to the ethics of human subjects research and dataprivacy, and cross-institution variation in the rules and guide-lines related to privacy and proprietary issues.32,33 Distrib-uted networks can be built and tested in phases that allow thenetwork to operate while being built, and network dataresources can be updated and enhanced without disruptingoverall network operations. Networks have the need forresponsible and consistent stewardship of clinical records,and they exchange the requirements of centralized databaseadministrative and computing infrastructure for similar pro-

cesses on the part of each data holder. Thus, the administra-tive operation of networks can be cumbersome.

Many of the administrative, technical, and analyticbarriers to developing efficient and scalable distributed healthdata networks to support population-level analyses can beaddressed through network features that support the needs ofusers and data holders. We describe here the design and pilotimplementation of a distributed research network infrastruc-ture intended to meet the broad needs of all parties forcomparative effectiveness evaluation and other uses. We notethe challenges and barriers identified, and provide a blueprintfor development of a comprehensive distributed researchnetwork as reusable national resource.

METHODS

Background and Needs AssessmentThe design, implementation, and evaluation plan of the

pilot distributed network was based on findings from previousstudies,14,15,21,34 coupled with our experience in operating adistributed network, the HMO Research Network Center forEducation and Research on Therapeutics, and participating inother networks,23,27,28,31 including a current study of thesafety of the H1N1 vaccine.30 Our prior work investigated theneeds of data holders (eg, health plans) and potential networkusers (eg, federal agencies) with respect to making their dataavailable for comparative effectiveness and other secondaryuses.14,15 Data holders identified several requirements forvoluntary participation in a distributed network. These in-cluded: (1) complete control of, access to, and uses of, theirdata, (2) strong security and privacy features, (3) limitedimpact on internal systems, (4) minimal data transfer, (5)auditable processes, (6) standardization of administrative andregulatory agreements, (7) transparent governance, and (8)ease of participation and use.

Users’ needs assessments, by contrast, did not dependon whether the underlying architecture was a distributednetwork or centralized database. Potential users identifiedother key elements of a network, including: menu-drivenquerying, easy access for feasibility assessments and forpublic health surveillance and monitoring, the ability tospecify and create subsets of the complete data via menu-driven querying or complex programming code, and reuse ofnetwork tools to improve efficiency (eg, reuse of validatedexposure and outcome algorithms).15 Nontechnical userswanted the ability to ask simple questions without assistance(eg, counts of people between the ages of 65 and 74 with apositron emission tomography scan in 2008). More sophisti-cated users wanted the ability to perform complex analyses(eg, compare risk adjusted survival curves for breast cancerpatients treated with tamoxifen as adjuvant chemotherapy, tothose who were not treated). Potential users also noted that itis often difficult to get rapid responses from existing systems,and this was even noted by users who were also data holders.

ImplementationThe design and rapid prototyping process focused on

addressing the 8 specific data holder concerns noted above.We used a phased approach to development and implemen-

Brown et al Medical Care • Volume 48, Number 6 Suppl 1, June 2010

© 2010 Lippincott Williams & WilkinsS46 | www.lww-medicalcare.com

7

Page 58: Sharing Clinical Research Data:  an IOM Workshop

tation of the pilot distributed network. The first phase in-cluded a web-based portal system for menu-driven queryingof summary-level datasets held by 5 data holders (HarvardPilgrim Health Care, Group Health Cooperative, Geisinger

Health Systems, Kaiser Permanente Colorado, and Health-Partners Research Foundation). Figure 1 and Table 1 illus-trate and describe the system architecture and features; Figure2 illustrates the current menu-driven query interface.

TABLE 1. Description of System Components

Component Description

Portal

Access control manager Manages all aspects of security for portal including authentication, session management, policies, grouppermissions, user permissions, and access rights.

Query manager Manages query entry, routing, and distribution.

Results manager Manages receipt, organization, assembly, merging, and aggregation of result sets.

Workflow manager Manages workflow (eg, for query approval) including request routing, alerting and notification, approvalmanagement, and tracking.

User manager Manages user accounts.

Audit manager Provides auditing functions including activity and error logging.

Researcher user interface User interface for research users, including menu-driven and ad hoc query entry, query management, resultstatus, and result set management.

Data Mart administrator userinterface

User interface for Data Mart administrators including data mart setup and configuration, access controlmanagement, and workflow management.

System administrator userinterface

User interface for System administrators including portal setup and configuration, access controlmanagement, and user management.

Hub API Application Programming Interface (eg, web service API) for the DRN portal. Exposes portal functions forremote applications including query retrieval and results submission.

Hub database Database for the portal.

Message protocol Protocol for messaging between the portal and external applications including the Data Marts.

Data holder Data Mart

Data Mart security manager Manages all aspects of security for Data Mart including authentication, session management, policies, grouppermissions, user permissions, and access rights.

Query execution manager Manages review and execution (or rejection) of queries including queue management, query translation,query engine interface, and results handling.

Data Mart API Application Programming Interface (eg, web service API) for the Data Mart. Exposes functions for remoteapplications including query submission and results retrieval.

Data Mart audit manager Provides auditing functions including activity and error logging.

Data source manager Manages exchange of data between the Data Mart database and source systems.

Data Mart database Database for the Data Mart.

FIGURE 1. System architecture.

Medical Care • Volume 48, Number 6 Suppl 1, June 2010 Distributed Health Data Networks

© 2010 Lippincott Williams & Wilkins www.lww-medicalcare.com | S47

8

Page 59: Sharing Clinical Research Data:  an IOM Workshop

Development of the network software relied on a rapidprototyping approach that included multiple rounds of de-signing, building, testing, and revising of the interface andsupporting portal, and the creation of a novel applicationprogram, the Query Execution Manager (QEM). Initial que-rying capability was built for drug utilization by generic nameand drug class, diagnosis by 3-digit ICD-9-CM code, andprocedure (Healthcare Common Procedure Coding System)queries. Query results were stratifiable by age group, sex, andyear or year and quarter. These simple queries were chosenfor demonstration purposes, more complex queries are pos-sible within the network design.

From the user perspective, query creation involved log-ging into the web portal, selecting a query type (eg, generic drugname), selecting query parameters and the data holders to query,and once submitted, reviewing the status of queries and theaggregated query results. Each data holder installed the QEMsoftware and responded to multiple test queries. The system useda “pull” query distribution mechanism that notified (via e-mail)data holders of waiting queries. The data holder user thenopened the QEM to review the query details (eg, submitter andreason for submitting) and decides whether to execute, reject, ofhold the query for further review. If the data holder decided torun the query, the QEM downloaded the query text from theportal, executed it against a local database, and presented theresults to the data holder for review. The data holder could thenupload the results to the portal for aggregation with other resultsand review by the submitter.

Test scenarios were based on summary level data andincluded assessment of temporal trends in the use of genetictesting, influenza-related medical and pharmacy use, attention-deficit hyperactivity disorder medication use by age, and urti-caria diagnoses by age and year, and rate of diabetes by age. Weassessed the ability of the system to execute and perform each ofthe user and data holder tasks described above for each of thetest queries submitted. Throughout the rapid prototyping, imple-

mentation, and testing process we also informally evaluated dataholder acceptance of the system and their willingness to con-tinue development and testing beyond the demonstration. Othernetwork functions such as user-based access control (ie, usershave different levels of permissions to submit queries), queryformation restrictions (ie, limit on the number and type of queryparameters available for selection); and query results viewingrules (ie, requirement of 2 data holder responses before a usercan view and export aggregated results) also were developed,tested, and evaluated.

Separately, we partnered with the National Center forPublic Health Informatics to design and pilot test an alterna-tive approach for securely transmitting queries and receivingresults. The use-case for this pilot test illustrated how anauthorized user could securely authenticate to a central portaland securely distribute a computer program to each dataholders through their local firewall. The program was exe-cuted and the results securely returned for aggregation. De-tails of this work is described elsewhere.35

RESULTS

Implementation and System FunctioningThe system was successfully implemented and tested at

each of the 5 participating data holders. Each data holder wasable to install and operate the QEM, retrieve and executequeries, and upload results. Installation took approximately 15minutes.

Each of the sample queries was executed without error.Figures 3A and B show how data holders interact with thesystem, specifically the functions that allow them to reviewqueries before executing them, and review results before upload-ing them to the central portal. Figure 4 illustrates results from aset of sample queries regarding the use of attention-deficithyperactivity disorder medications; this type of information canhelp identify trends that may warrant further evaluation or help

FIGURE 2. Web-based Menu-drivenQuery Formation. It shows part ofthe query creation page that allowsa user to select specific exposures,diagnoses, or procedures, time peri-ods, and data sources.

Brown et al Medical Care • Volume 48, Number 6 Suppl 1, June 2010

© 2010 Lippincott Williams & WilkinsS48 | www.lww-medicalcare.com

9

Page 60: Sharing Clinical Research Data:  an IOM Workshop

understand the adoption and diffusion of new products. Thesample queries were completed within a day of submission; thisperiod included local review of incoming queries and of theresults.

Evaluation of Acceptability to Data HoldersAll data holders found the system acceptable and were

willing to continue working on development and implemen-tation. The data holders found the approach acceptable be-cause the design directly addressed many of the data holderconcerns listed above. Specifically, the design illustrated: (1)data holder data autonomy that allowed approval of all query

executions and data uploads, (2) an easy to use query inter-face, (3) centralization of the network logic in a portal, and(4) simple installation and light-weight software that did notrequire any special expertise to install or use. Further, the“pull” mechanism for query distribution (ie, data holders arenotified of waiting queries and retrieve them) was also animportant favorable factor for data holders’ acceptance.

DISCUSSIONOverall, this demonstration validated key design fea-

tures of a distributed health data network including: use of a

FIGURE 3. Query Execution Manager: Query review and results screen. A, Shows query details that are presented to the dataholder for review. In this case the query text is submitted as structured query; alternatively, the design allows a user to sendan appropriately structured analytic program. B, Shows local query results for review and approval for upload to the centralportal for aggregation.

FIGURE 4. Example of findings thatcould be obtained using the querytool output: Rate of attention-defi-cit-hyperactivity disorder drug useby age group and year.

Medical Care • Volume 48, Number 6 Suppl 1, June 2010 Distributed Health Data Networks

© 2010 Lippincott Williams & Wilkins www.lww-medicalcare.com | S49

10

Page 61: Sharing Clinical Research Data:  an IOM Workshop

central portal, a “pull” query distribution mechanism thatobviated the need to allow queries to pass through firewalls,and local control that allows data holders to maintain physicalcontrol of their data and all uses while simultaneously in-creasing research access for authorized users. These featureswere paramount to data holders. The use of summary-leveldata for this demonstration obviated many data privacy con-cerns, facilitating acceptance by data holders. The technicalplatform we developed is fully capable of supporting the useof patient and encounter-level utilization information, includ-ing the ability to submit a SAS program as the query text.14

This design minimizes data holder information technol-ogy responsibilities, leaves protected health information un-der the control of the data holder, provides for a morestraightforward security implementation, and focuses net-work management tasks at the central portal. It supportsimportant capabilities such as secure communications anddata protection, auditable processes, a simple query interface,and locally managed query authorization.

The menu-driven interface facilitates the use of thedistributed network by users with limited technical expertise.Network features streamline workflow among participatingsites and enable an authorized user to quickly assess thefeasibility of various comparative effectiveness studies inlarger populations than might not normally be readily avail-able. This demonstration project was relatively limited inscope to allow data holders to become comfortable with thetechnical design and the governance needed to manage andadjudicate simple query requests. More complex and com-plete demonstrations are currently underway that leverage thesame network infrastructure to support full comparative ef-fectiveness studies. Enhancements will allow more sophisti-cated authorization, security, and permission policies, a moreflexible query interface, and access to additional data andquery types. In addition, the infrastructure is designed toallow distributed multivariate analyses, either through use theof iterative methods36,37 or by merging site-specific analysisfiles that omit PHI, for instance through the use of highdimensionality propensity scores.38

Finally, advances in governance are as important toexpanding this network model as any of the technical capa-bilities. Network governance requires policies and proceduresto address issues such as data holder protections, conflict ofinterest, external communications, priority setting, by-laws,data security, accounting, network strategy, stakeholder is-sues, and HIPAA and human subjects protection. In addition,a coordinating center is needed to maintain network infra-structure, documentation, coordination, monitoring of dataresources and contacts, documentation of lessons learned,data validity activities, and study implementation.

CONCLUSIONIn theory, either a distributed or a centralized (eg,

all-payer) approach can meet the need to use the growingamount of electronic health data to address important societalquestions. However, a distributed network is preferred be-cause it can perform essentially all the functions desired of acentralized database, while avoiding many disadvantage of

centralized databases. In addition, distributed networks havethese advantages, compared with centralized systems: (1)They allow data holders to maintain physical control overtheir data; without this control, in our experience, data hold-ers are unlikely to voluntarily participate. (2) They ensureongoing participation of individuals who are knowledgeableabout the systems and practices that underlie each dataholder’s data. (3) They allow data holders to assess andauthorize query requests, or categories of requests, on auser-by-user or case-by-case basis. (4) Distributed systemsminimize the need to disclose protected health informationthus mitigating privacy concerns, many of which are regu-lated by the Privacy and Security Rules of the Health Insur-ance Portability and Accountability Act of 1996 (HIPAA).(5) Distributed systems minimize the need to disclose andlose control of proprietary data. (6) A distributed approacheliminates the need to create, secure, maintain, and manageaccess to a complex central data warehouse. (7) Finally, adistributed network also avoids the need to repeatedly trans-fer and pool data to maintain a current database, which is acostly undertaking each time updating is necessary.

A phased approach is suggested for implementing alarge scale distributed research network infrastructure forcomparative effectiveness research and other purposes. Ques-tions that leverage the most commonly used and best under-stood data types, target large populations, and execute stan-dard statistical analyses using well-developed softwarepackages will be the best candidates for demonstration stud-ies. The Agency for Healthcare Research and Quality’s recentgrant awards include several examples of research projectsthat might benefit from the large study populations accessiblethrough distributed data networks, including the study ofoutcomes resulting from depression treatments and asthmatreatments in pregnancy, the study of ACE inhibitors inAfrican-American males, and the study of various treatmentsfor lumbar spine.39 Similarly, the Food and Drug Adminis-tration’s planned Sentinel Initiative to monitor the safety ofmedical products, could also use a distributed data network,either identical to one developed for comparative effective-ness, or very similar to it. Relatively small investmentscompared with the cost of developing the underlying elec-tronic health information will reduce individual study costsand demonstrate the value of creating reusable, distributedresearch networks.

REFERENCES1. Baciu A, Stratton K, Burke SP, eds. The Future of Drug Safety:

Promoting and Protecting the Health of the Public. Washington, DC:Institute of Medicine of the National Academies; 2006.

2. McClellan M. Drug safety reform at the FDA—pendulum swing orsystematic improvement? N Engl J Med. 2007;356:1700–1702.

3. Platt R, Wilson M, Chan KA, et al. The new Sentinel Network—improving the evidence of medical-product safety. N Engl J Med.2009;361:645–647.

4. Strom BL. The future of pharmacoepidemiology. In: Strom BL, ed.Pharmacoepidemiology. Chichester, United Kingdom: John Wiley &Sons; 2005.

5. Federal Coordinating Council for Comparative Effectiveness Re-search. Report to the President and Congress. Department of Healthand Human Services; 2009. Available at: www.hhs.gov/recovery/prgrams/cer/cerannualrpt.pdf.

Brown et al Medical Care • Volume 48, Number 6 Suppl 1, June 2010

© 2010 Lippincott Williams & WilkinsS50 | www.lww-medicalcare.com

11

Page 62: Sharing Clinical Research Data:  an IOM Workshop

6. Gliklich RE, Dreyer NA, eds. Registries for Evaluating Patient Outcomes:A User’s Guide. (Prepared by Outcome DEcIDE Center �Outcome Sci-ences, Inc. dba Outcome� under Contract No. HHSA29020050035I TO1.).AHRQ Publication No. 07-EHC001-1. Rockville, MD: Agency for Health-care Research and Quality. April 2007.

7. Institute of Medicine. Initial National Priorities for Comparative EffectivenessResearch. The National Academies Press: 2009. Available at: http://www.iom.edu/Reports/2009/ComparativeEffectivenessResearchPriorities.aspx.

8. Robert Wood Johnson Foundation. National Effort to Measure andReport on Quality and Cost-Effectiveness of Health Care Unveiled.2007. Available at: http://www.rwjf.org/pr/product.jsp?id�22371. Ac-cessed February, 2010.

9. Engelberg Center for Health Care Reform at Brookings. Using Data toSupport Better Health Care: One Infrastructure with Many Uses. 2009.Available at: http://www.brookings.edu/events/2009/1202_health_care_data.aspx. Accessed February, 2010.

10. Engelberg Center for Health Care Reform at Brookings. ImplementingComparative Effectiveness Research: Priorities, Methods, and Impact.June 2009. Available at: http://www.brookings.edu/�/media/Files/events/2009/0609_health_care_cer/0609_health_care_cer.pdf.

11. Mosquera M. HHS awards $1M contract for effectiveness database.Government Health IT, 2010. Available at: http://govhealthit.com/newsitem.aspx?nid�73035. Accessed January, 2010.

12. Office for Oregon Health Policy and Research. POLICY BRIEF: All-Payer, All-Claims Data Base. 2009. Available at: http://www.oregon.gov/OHPPR/HFB/docs/2009_Legislature_Presentations/Policy_Briefs/PolicyBrief_AllPayerAllClaimsDatabase_4.30.09.pdf.

13. Miller P, Schneider CD. State and National Efforts to Establish All-Payer Claims Databases. Paper presented at: All-payer claims databases:A key to healthcare reform. Massachusetts Health Data Consortium FallWorkshop; 2009; Boston, MA.

14. Brown JS, Holmes J, Maro J, et al. Design specifications for networkprototype and cooperative to conduct population-based studies andsafety surveillance. Effective Health Care Research Report No. 13.(Prepared by the DEcIDE Centers at the HMO Research Network Centerfor Education and Research on Therapeutics and the University ofPennsylvania Under Contract No. HHSA29020050033I T05.) Agencyfor Healthcare Research and Quality: July 2009. Available at: www.effectivehealthcare.ahrq.gov/reports/final.cfm.

15. Maro JC, Platt R, Holmes JH, et al. Design of a national distributedhealth data network. Ann Intern Med. 2009;151:341–344.

16. Rosati K. Using electronic health information for pharmacovigilance:the promise and the pitfalls. J Health Life Sci Law. 2009;2:171, 173–239.

17. Rosati KB. HIPAA privacy: the compliance challenges ahead. J HealthLaw. 2002;35:45–82.

18. Hornbrook MC, Hart G, Ellis JL, et al. Building a virtual cancer researchorganization. J Natl Cancer Inst Monogr. 2005:12–25.

19. Lazarus R, Yih K, Platt R. Distributed data processing for public healthsurveillance. BMC Public Health. 2006;6:235.

20. McMurry AJ, Gilbert CA, Reis BY, et al. A self-scaling, distributedinformation architecture for public health, research, and clinical care.J Am Med Inform Assoc. 2007;14:527–533.

21. Moore KM, Duddy A, Braun MM, et al. Potential population-basedelectronic data sources for rapid pandemic influenza vaccine adverseevent detection: a survey of health plans. Pharmacoepidemiol Drug Saf.2008;17:1137–1141.

22. Platt R, Davis R, Finkelstein J, et al. Multicenter epidemiologic andhealth services research on therapeutics in the HMO Research NetworkCenter for Education and Research on Therapeutics. Pharmacoepide-miol Drug Saf. 2001;10:373–377.

23. Wagner EH, Greene SM, Hart G, et al. Building a research consortiumof large health systems: the Cancer Research Network. J Natl CancerInst Monogr. 2005:3–11.

24. Chen RT, Glasser JW, Rhodes PH, et al. Vaccine Safety Datalinkproject: a new tool for improving vaccine safety monitoring in theUnited States. The Vaccine Safety Datalink Team. Pediatrics. 1997;99:765–773.

25. Vacine Safety Datalink. Available at: http://www.cdc.gov/vaccinesafety/vsd/.Accessed January, 2010.

26. Chan K; HMO Research Network. The HMO Research Network. In:Strom BL, ed. Pharmacoepediomology. Chichester, United Kingdom:John Wiley & Sons; 2005.

27. Go AS, Magid DJ, Wells B, et al. The Cardiovascular Research Net-work: a new paradigm for cardiovascular quality and outcomes research.Circ Cardiovasc Qual Outcomes. 2008;1:138–147.

28. Magid DJ, Gurwitz JH, Rumsfeld JS, et al. Creating a research datanetwork for cardiovascular disease: the CVRN. Expert Rev CardiovascTher. 2008;6:1043–1045.

29. Davis RL, Kolczak M, Lewis E, et al. Active surveillance of vaccinesafety: a system to detect early signs of adverse events. Epidemiology.2005;16:336–341.

30. Federal Immunization Safety Task Force. Federal Plans to MonitorImmunization Safety for the Pandemic 2009 H1N1 Influenza Vaccina-tion Program. 2009. Available at: http://www.flu.gov/professional/federal/fed-plan-to-mon-h1n1-imm-safety.pdf. Accessed January 31, 2010.

31. Velentgas P, Bohn R, Brown JS, et al. A distributed research networkmodel for post-marketing safety studies: the Meningococcal VaccineStudy. Pharmacoepidemiol Drug Saf. 2008;17:1226–1234.

32. Greene SM, Geiger AM. A review finds that multicenter studies facesubstantial challenges but strategies exist to achieve Institutional ReviewBoard approval. J Clin Epidemiol. 2006;59:784–790.

33. Greene SM, Geiger AM, Harris EL, et al. Impact of IRB requirements ona multicenter survey of prophylactic mastectomy outcomes. Ann Epide-miol. 2006;16:275–278.

34. Brown JS, Moore KM, Braun MM, et al. Active influenza vaccine safetysurveillance: potential within a healthcare claims environment. MedCare. 2009;47:1251–1257.

35. National Center for Public Health Informatics Public Health ResearchGrid. Pubic Health Grid Research and Development: DRN Demonstra-tion. 2009. Available at: http://phgrid.blogspot.com/search/label/DRNand http://sites.google.com/site/phgrid/. Accessed February, 2010.

36. Karr A, Lin X, Sanil AP, et al. Secure regression on distributeddatabases. J Comput Graph Stat. 2005;14:263–279.

37. Karr AF, Lin X, Reiter JP, et al. Secure regression on distributeddatabases. J Comput Graph Stat. 2005;14:1–18.

38. Rassen JA, Avorn J, Schneeweiss S. Multivariate-adjusted pharmaco-epidemiologic analyses of confidential information pooled from multiplehealth care utilization databases. Pharmacoepidemiol Drug Saf. In press.

39. Agency for Healthcare Research and Quality. Comparative Effectiveness Re-search Grant Awards. 2010. Available at: http://effectivehealthcare.ahrq.gov/index.cfm/comparative-effectiveness-research-grant-awards/. Accessed Janu-ary, 2010.

Medical Care • Volume 48, Number 6 Suppl 1, June 2010 Distributed Health Data Networks

© 2010 Lippincott Williams & Wilkins www.lww-medicalcare.com | S51

12

Page 63: Sharing Clinical Research Data:  an IOM Workshop

Editor’s Perspective

Open Science and Data Sharing in Clinical ResearchBasing Informed Decisions on the Totality of the Evidence

Harlan M. Krumholz, MD, SM

A patient tentatively enters the examination room. Adecision looms, and the patient waits with some trepi-

dation. A doctor soon appears and pulls up a chair, preparedfor the discussion by having conscientiously reviewed themedical literature, examined the evidence-based systematicreviews, and highlighted the relevant information. The ensu-ing conversation reflects on the patient’s treatment goals andthe published information about the benefits and harms of theavailable options. The tension breaks with a decision, andplans for the next steps are put in place.

Every day, patients and their caregivers are faced withdifficult decisions about treatment. They turn to physicians andother healthcare professionals to interpret the medical evidenceand assist them in making individualized decisions.

Unfortunately, we are learning that what is published in themedical literature represents only a portion of the evidence thatis relevant to the risks and benefits of available treatments. In aprofession that seeks to rely on evidence, it is ironic that wetolerate a system that enables evidence to be outside of publicview. Those who own data, usually scientists or industry, havethe choice of what, where, and when to publish. As a result, ourmedical literature portrays only a partial picture of the evidenceabout clinical strategies, including drugs and devices. Expertshave recently drawn attention to this issue, including contribu-tions in this issue of our journal, but there is resistance tochange.1–5

Studies document that a remarkable percentage of trials arenot published within a reasonable period after they are com-pleted. Ross and colleagues reported that fewer than half of trialswere published within 2 years of completion.4 Althoughindustry-sponsored studies tended to have the lowest publicationrates, the problem spans all study sponsors. A subsequent studyfound that only 46% of clinical trials funded by the NationalInstitutes of Health were published within 30 months of com-pletion.6 Lee and colleagues found that fewer than half of thenew drug trials submitted to the US Food and Drug Adminis-tration were published within 5 years of drug approval.7 Al-though pivotal trials were more likely to be published, 24%

remained unpublished at 5 years. Negative trials were less likelyto be published, yet 34% of the positive trials were alsounpublished at 5 years.

We tout systematic reviews, the linchpin of evidence-basedmedicine, as a method to distill and interpret the medicalliterature. The Institute of Medicine recently produced stan-dards for these reviews and noted that, “Healthcare decisionmakers, including clinicians and other healthcare providers,increasingly turn to systematic reviews for reliable, evidence-based comparisons of health interventions.”8 Others havepublished checklists to promote high-quality methods9; how-ever, the elegance of the methods cannot overcome theundermining effect of missing studies.

Missing studies are unlikely to be missing at random, and theeffect of the missing data on inferences about an intervention isnot easy to predict. Hart and colleagues investigated the effect ofunpublished data on the results of meta-analyses of drug trials.10

They showed that the unpublished data influenced the summaryestimates such that the drug efficacy was lower in 46% of themeta-analyses and greater in 46%. The result was essentiallyunchanged in only 7% of the studies. Readers of meta-analyses,without the benefit of these types of analyses, are left to wonderhow many missing trials might be relevant to the clinicalquestion and what their effect might be.

In some cases, the missing clinical research data may holdimportant information about risk. A classic example occurredwith Vioxx (rofecoxib). Merck had data, mostly unpublishedand obtained several years before the drug was withdrawn fromthe market,11 demonstrating that Vioxx likely increased the riskof an acute myocardial infarction. No systematic analysis couldhave detected this harm because most of the data were beyondpublic view. Similarly, GlaxoSmithKline had data, much of itunpublished, which indicated that Avandia (rosiglitazone) in-creased the risk of an acute myocardial infarction. Nissen andcolleagues, accessing the data made available through litigation,revealed the concern that ultimately led to US Food and DrugAdministration restrictions on the use of the drug.12,13

Selective publication similarly skews the evidence base. Forexample, Psaty and Kronmal, using information obtained duringlitigation, reported that Merck and its consultants employedselective reporting in representing mortality results in trials ofVioxx in patients with Alzheimer disease or cognitive impair-ment.14 The published studies failed to report that an intention-to-treat analysis showed Vioxx to be associated with a signifi-cant increase in all-cause mortality. Turner and colleaguesshowed that trials of antidepressant drugs highlighted positiveresults as if they were the primary outcomes, contrary to theoriginal study protocols.15 They concluded that, “By altering theapparent risk-benefit ratio of drugs, selective publication can lead

The opinions expressed in this article are not necessarily those of theAmerican Heart Association.

From the Section of Cardiovascular Medicine and the Robert WoodJohnson Clinical Scholars Program, Department of Medicine, YaleUniversity School of Medicine; Section of Health Policy and Adminis-tration, Yale School of Public Health; Center for Outcomes Research andEvaluation, Yale-New Haven Hospital, New Haven, Conn.

Correspondence to Harlan Krumholz, 1 Church St, Suite 200, NewHaven, CT 06510. E-mail [email protected]

(Circ Cardiovasc Qual Outcomes. 2012;5:141-142.)© 2012 American Heart Association, Inc.

Circ Cardiovasc Qual Outcomes is available athttp://circoutcomes.ahajournals.org

DOI: 10.1161/CIRCOUTCOMES.112.965848

141 at FDA Library on June 22, 2012circoutcomes.ahajournals.orgDownloaded from 13

Page 64: Sharing Clinical Research Data:  an IOM Workshop

doctors to make inappropriate prescribing decisions that may not bein the best interest of their patients and, thus, the public health.”

The myriad of reasons why trials are selectively or neverpublished include professional bias, profit, journal bias againstnegative studies, loss of interest, lack of funding, and competinginterests. For example, when results do not confirm the beliefs ofinvestigators, the motivation for their dissemination mayweaken, as may the willingness of the investigator to devotetime and resources to the publication. Companies may also loseinterest when the findings are counter to embedded beliefs abouta product. The end result is a scientific culture in which manyexperiments performed on patients are absent from the medicalliterature.

Another issue relevant to selective and nonpublication is thataccess to a research database is commonly restricted to theprincipal investigator or the funders. For many trials, there is noopportunity for independent replication of the analyses or eval-uation of the raw data. The US Food and Drug Administrationhas access to these data, but other experts do not.

The sharing of trial data, as of yet uncommon except throughmechanisms by some funders such as the National Heart, Lung,and Blood Institute,16 could provide an opportunity to leveragethe strength of the global community of investigators. Manytrials yield only a fraction of the knowledge that could beproduced with more resources and creativity.

Now is the time to bring data sharing and open science intothe mainstream of clinical research, particularly with respectto trials that contain information about the risks and benefitsof treatments in current use. This could be accomplishedthrough the following steps:

1. Post, in the public domain, the study protocol for eachpublished trial. The protocol should be comprehensiveand include policies and procedures relevant to actionstaken in the trial.

2. Develop mechanisms for those who own trial data toshare their raw data and individual patient data.

3. Encourage industry to commit to place all its clinicalresearch data relevant to approved products in thepublic domain. This action would acknowledge that theprivilege of selling products is accompanied by aresponsibility to share all the clinical research datarelevant to the products’ benefits and harms.

4. Develop a culture within academics that values datasharing and open science. After a period in which theoriginal investigators can complete their funded studies,the data should be de-identified and made available forinvestigators globally.

5. Identify, within all systematic reviews, trials that are notpublished, using sources such as clinicaltrials.gov andregulatory postings to determine what is missing.

6. Share data.

The path is not easy. We lack an infrastructure and funding,and incentives in the system run counter to sharing. Those whoshare may be giving advantage to “competitors.” The impera-tive, however, derives from what is best for society and ourobligation to respect the contributions made by the subjects whoagreed to participate in the research studies.

Patients facing a decision deserve information that is based onall of the evidence. A new era of data sharing and open science

would allow us to leverage existing investments to provide moreand better evidence that will increase the possibility that pa-tients’ decisions will help them obtain the results they desire.

Sources of FundingDr Krumholz is supported by grant U01 HL105270-02 (Center forCardiovascular Outcomes Research at Yale University) from theNational Heart, Lung, and Blood Institute.

DisclosuresDr Krumholz discloses that he is the recipient of a research grantfrom Medtronic, Inc, through Yale University and is chair of acardiac scientific advisory board for UnitedHealth.

References1. Gøtzsche P. Strengthening and opening up health research by sharing our

raw data. Circ Cardiovasc Qual Outcomes. 2012;5:236–237.2. Lehman R, Loder E. Missing clinical trial data. BMJ. 2012;344:d8158.3. Ross JS, Lehman R, Gross CP. The importance of clinical trial data

sharing: toward more open science. Circ Cardiovasc Qual Outcomes.2012;5:238–240.

4. Ross JS, Mulvey GK, Hines EM, Nissen SE, Krumholz HM. Trialpublication after registration in ClinicalTrials.Gov: a cross-sectional anal-ysis. PLoS Med. 2009;6:e1000144.

5. Spertus JA. The double-edged sword of open access to research data. CircCardiovasc Qual Outcomes. 2012;5:143–144.

6. Ross JS, Tse T, Zarin DA, Xu H, Zhou L, Krumholz HM. Publication ofNIH funded trials registered in ClinicalTrials.gov: cross sectional analy-sis. BMJ. 2012;344:d7292.

7. Lee K, Bacchetti P, Sim I. Publication of clinical trials supporting suc-cessful new drug applications: a literature analysis. PLoS Med. 2008;5:e191.

8. Committee on Standards for Systematic Reviews of Comparative Effec-tiveness Research, Institute of Medicine. Finding what works in healthcare: standards for systematic reviews. The National Academies Press;2011. http://www.iom.edu/Reports/2011/Finding-What-Works-in-Health-Care-Standards-for-Systematic-Reviews.aspx. Accessed February 21,2012.

9. Stroup DF, Berlin JA, Morton SC, Olkin I, Williamson GD, Rennie D,Moher D, Becker BJ, Sipe TA, Thacker SB. Meta-analysis of observa-tional studies in epidemiology: a proposal for reporting. Meta-analysis OfObservational Studies in Epidemiology (MOOSE) group. JAMA. 2000;283:2008–2012.

10. Hart B, Lundh A, Bero L. Effect of reporting bias on meta-analyses ofdrug trials: reanalysis of meta-analyses. BMJ. 2012;344:d7202.

11. Ross JS, Madigan D, Hill KP, Egilman DS, Wang Y, Krumholz HM.Pooled analysis of rofecoxib placebo-controlled clinical trial data: lessonsfor postmarket pharmaceutical safety surveillance. Arch Intern Med.2009;169:1976–1985.

12. Nissen SE, Wolski K. Effect of rosiglitazone on the risk of myocardialinfarction and death from cardiovascular causes. N Engl J Med. 2007;356:2457–2471.

13. United States Food and Drug Administration. FDA drug safety commu-nication: updated risk evaluation and mitigation strategy (REMS) torestrict access to rosiglitazone-containing medicines including Avandia,Avandamet, and Avandaryl. http://www.fda.gov/Drugs/DrugSafety/ucm255005.htm. May 18, 2011. Accessed February 21, 2012.

14. Psaty BM, Kronmal RA. Reporting mortality findings in trials ofrofecoxib for Alzheimer disease or cognitive impairment: a case studybased on documents from rofecoxib litigation. JAMA. 2008;299:1813–1817.

15. Turner EH, Matthews AM, Linardatos E, Tell RA, Rosenthal R. Selectivepublication of antidepressant trials and its influence on apparent efficacy.N Engl J Med. 2008;358:252–260.

16. National Heart, Lung, and Blood Institute. NHLBI policy for data sharingfrom clinical trials and epidemiological studies. http://www.nhlbi.nih.gov/funding/datasharing.htm. Accessed February 21, 2012.

KEY WORDS: evidence-based medicine � data sharing � medicaldecision-making � open access

142 Circ Cardiovasc Qual Outcomes March 2012

at FDA Library on June 22, 2012circoutcomes.ahajournals.orgDownloaded from 14

Page 65: Sharing Clinical Research Data:  an IOM Workshop

ORIGINAL CONTRIBUTION

Characteristics of Clinical Trials Registeredin ClinicalTrials.gov, 2007-2010Robert M. Califf, MDDeborah A. Zarin, MDJudith M. Kramer, MD, MSRachel E. Sherman, MD, MPHLaura H. Aberle, BSPHAsba Tasneem, PhD

CLINICAL TRIALS ARE THE CEN-tralmeansbywhichpreventive,diagnostic, and therapeuticstrategiesareevaluated,1butthe

US clinical trials enterprise has beenmarkedbydebateregarding fundingpri-oritiesforclinicalresearch,thedesignandinterpretationofstudies,andprotectionsforresearchparticipants.2-4Untilrecently,however, we have lacked tools for com-prehensively assessing trials across thebroader US clinical trial enterprise.

In 1997, Congress mandated the cre-ationof theClinicalTrials.govregistry toassistpeoplewithseriousillnessesingain-ing access to trials.5 In September 2004,the International Committee of MedicalJournal Editors (ICMJE) announced apolicy, which took effect in 2005, of re-quiring registration of clinical trials as aprerequisite forpublication.6,7 TheFoodandDrugAdministrationAmendmentAct(FDAAA)8 expanded the mandate ofClinicalTrials.gov to include most non–phase 1 interventional drug and de-vice trials, with interventional trials de-fined as “studies in human beings inwhich individuals are assigned by an in-vestigator based on a protocol to re-ceive specific interventions”9 (eTable 1,available at http://www.jama.com). Thelaw obliges sponsors or their desig-nees to register trials and record key

data elements (effective September 27,2007), report basic results (Septem-ber 27, 2008), and report adverse events(September 27, 2009).10

Recent work11,12 highlights the inad-equate evidence base of current prac-tice, in which less than 15% of majorguideline recommendations are basedon high-quality evidence, often de-fined as evidence that emanates fromtrials with appropriate designs;sufficiently large sample sizes; and

appropriate, validated outcome mea-sures,13,14 as well as oversight by insti-tutional review boards and data moni-

For editorial comment see p 1861.

Author Affiliations: Duke Translational Medicine Insti-tute, Durham, North Carolina (Drs Califf and Kramer);NationalLibraryofMedicine,National InstitutesofHealth,Bethesda,Maryland (DrZarin);OfficeofMedicalPolicy,Center for Drug Evaluation and Research, US Food andDrug Administration, Silver Spring, Maryland (Dr Sher-man);andDukeClinicalResearch Institute,Durham(DrsKramer and Tasneem and Ms Aberle).Corresponding Author: Robert M. Califf, MD, DukeTranslational Medicine Institute, 200 Trent Dr, 1117Davison Bldg, Durham, NC 27710 ([email protected]).

Context Recent reports highlight gaps between guidelines-based treatment recom-mendations and evidence from clinical trials that supports those recommendations.Strengthened reporting requirements for studies registered with ClinicalTrials.gov en-able a comprehensive evaluation of the national trials portfolio.

Objective To examine fundamental characteristics of interventional clinical trials reg-istered in the ClinicalTrials.gov database.

Methods A data set comprising 96 346 clinical studies from ClinicalTrials.gov wasdownloaded on September 27, 2010, and entered into a relational database to ana-lyze aggregate data. Interventional trials were identified and analyses were focusedon 3 clinical specialties—cardiovascular, mental health, and oncology—that togetherencompass the largest number of disability-adjusted life-years lost in the United States.

Main Outcome Measures Characteristics of registered clinical trials as reporteddata elements in the trial registry; how those characteristics have changed over time;differences in characteristics as a function of clinical specialty; and factors associatedwith use of randomization, blinding, and data monitoring committees (DMCs).

Results Thenumberof registered interventional clinical trials increased from28 881(Oc-tober2004–September2007) to40 970(October2007–September2010),andthenumberof missing data elements has generally declined. Most interventional trials registered be-tween 2007 and 2010 were small, with 62% enrolling 100 or fewer participants. Manyclinical trials were single-center (66%; 24 788/37 520) and funded by organizations otherthan industry or the National Institutes of Health (NIH) (47%; 17 592/37 520). Hetero-geneity in the reported methods by clinical specialty; sponsor type; and the reported useof DMCs, randomization, and blinding was evident. For example, reported use of DMCswas less common in industry-sponsoredvsNIH-sponsored trials (adjustedodds ratio [OR],0.11;95%CI,0.09-0.14), earlier-phasevsphase3 trials (adjustedOR,0.83;95%CI,0.76-0.91), and mental health trials vs those in the other 2 specialties. In similar comparisons,randomization and blinding were less frequently reported in earlier-phase, oncology, anddevice trials.

Conclusion Clinical trials registered in ClinicalTrials.gov are dominated by small trialsand contain significant heterogeneity in methodological approaches, including re-ported use of randomization, blinding, and DMCs.JAMA. 2012;307(17):1838-1847 www.jama.com

1838 JAMA, May 2, 2012—Vol 307, No. 17 ©2012 American Medical Association. All rights reserved.

Downloaded From: http://jama.jamanetwork.com/ by a National Academy of Sciences User on 07/17/201215

Page 66: Sharing Clinical Research Data:  an IOM Workshop

toring committees (DMCs) to protectparticipants and ensure the trial’s in-tegrity.14

In this article, we examine funda-mental characteristics of interven-tional clinical trials in 3 major thera-peutic areas contained in the registry(cardiovascular, mental health, and on-cology), focusing on study character-istics (data elements reported in trialregistration) that are desirable for gen-erating reliable evidence from clinicaltrials, including factors associated withuse of DMCs, randomization, andblinding.

METHODSThe methods used by ClinicalTrials.govto register clinical studies have been de-scribed previously.15-17 Briefly, spon-sors and investigators from around theworld enter data through a web-baseddata entry system. The country ad-dress of each facility (ie, a site that canpotentially enroll participants) was usedto group sites into regions according torubrics used by ClinicalTrials.gov.18 (In-dividual countries included in each re-gion are available.) The sample we ex-amined includes studies registered tocomply with legal obligations, as wellas those registered voluntarily to meetICMJE requirements or for other rea-sons. Similarly, data for registered stud-ies include both mandatory and op-tional elements. Over time, the types,definitions, and optional vs manda-tory status of data elements havechanged. Mandatory and optional dataelements for registration as of August2011 are shown in eAppendix 1.

ClinicalTrials.gov Data Set

We downloaded an XML data set com-prising all 96 346 clinical studies reg-istered with ClinicalTrials.gov as of Sep-tember 27, 2010—1 decade after theregistry’s launch and 3 years after en-actment of the FDAAA. We loaded thedata set into a relational database(Oracle RDBMS version 11.1g, OracleCorporation) to facilitate aggregateanalysis. This resource, the Database forAggregate Analysis of ClinicalTrials.gov(AACT), as well as data definitions, and

comprehensive data dictionaries, isavailable at the Clinical Trials Trans-formation Initiative website.19

Our analysis was restricted to inter-ventional studies registered withClinicalTrials.gov between October 2007and September 2010. To identify inter-ventional studies, we used the “studytype” field from the ClinicalTrials.govregistry, which included the followingchoices: interventional, observational, ex-panded access, and not available (NA)(eAppendix 1). Interventional trials weredefined as “studies in human beings inwhich individuals are assigned by an in-vestigator based on a protocol to re-ceive specific interventions.” In thisstudy, the terms clinical trial, interven-tional trial, and interventional study aresynonomous. Interventional studieswereregrouped within the downloaded, de-rivative database according to the 3 clini-cal specialties—cardiovascular, oncol-ogy, and mental health—that togetherencompass the largest number of dis-ability-adjusted life-years lost in theUnited States.20 For this regrouping, weused submitted disease condition termsand Medical Subject Heading (MeSH)terms generated by a National Library ofMedicine (NLM) algorithm to developa methodology to annotate, validate, ad-judicate, and implement disease condi-tion terms(MeSHandnon-MeSH) tocre-ate specialty data sets.

A subset of the 2010 MeSH thesau-rus from the NLM21 and a list of non-MeSH disease condition terms pro-vided by data submitters that appearedin 5 or more interventional studies in theanalysis data set were reviewed and an-notated by clinical specialists at DukeUniversity Medical Center (eAppendix2). Terms were annotated according totheir relevance to a given specialty(Y=relevant, N=not relevant). Spe-cialty data sets were created and the re-sults of algorithmic classifications werevalidated by comparison with classifica-tions based on manual review. Clinicaltrials were classed according to date reg-istered and by interventional status. De-tails regarding the creation of these spe-cialty data sets are provided in an articledescribing the study methodology.22

Within these specialty data sets, a fewdata elements are missing because oflimitations in the data set or logisticalproblems in obtaining analyzable in-formation. Specifically, the data ele-ment “human subject review” is notpresent in the public download, anddata regarding primary outcomes andoversight authority are not readily ana-lyzable because of the presence of free-text values.

Analytical Methods

Clinical trial characteristics were firstassessed overall, by interventional trials,and by 2 temporal subsets: October2004 through September 2007 and Oc-tober 2007 through September 2010.The percentage of trials registered be-fore and after enrollment of the first par-ticipant was also determined by com-paring the date of registration to thedate that the first participant was en-rolled. Other assessments includedclinical trial characteristics, enroll-ment characteristics, funding source,and number of study sites for all clini-cal trials vs cardiovascular, mentalhealth, and oncology trials for the lat-ter time period (October 2007–September 2010). Funding sources in-cluded industry, NIH, other US federal(excluding NIH), and other. Frequen-cies and percentages are provided forcategorical characteristics; medians andinterquartile ranges (IQRs) are pro-vided for continuous characteristics.

Logistic regression analysis was per-formed to calculate adjusted odds ra-tios (ORs) with Wald 95% confidenceintervals for factors associated withtrials that report use of DMCs, random-ization, and blinding. A full model con-taining 9 prespecified characteristicswas developed. The first of these waslead sponsor, which the NLM defines asthe primary organization that over-sees study implementation and is re-sponsible for conducting data analy-sis.19 Collaborators are defined as otherorganizations (if any) that provide ad-ditional support, including funding, de-sign, implementation, data analysis,and reporting. The sponsor (or desig-nee) is responsible for confirming all

CLINICAL TRIALS REGISTERED IN CLINICALTRIALS.GOV

©2012 American Medical Association. All rights reserved. JAMA, May 2, 2012—Vol 307, No. 17 1839

Downloaded From: http://jama.jamanetwork.com/ by a National Academy of Sciences User on 07/17/201216

Page 67: Sharing Clinical Research Data:  an IOM Workshop

collaborators before listing them.ClinicalTrials.gov stores funding orga-nization information in 2 data ele-ments: lead sponsor and collaborator.The NLM classifies submitted agencynames in these data elements as indus-try, NIH, US federal (excluding NIH),or other. We derived probable fundingsource from the lead sponsor and col-laborator fields using the following al-gorithm: if the lead sponsor was fromindustry, or the NIH was neither a leadsponsor nor collaborator and at least 1collaborator was from industry, then thestudy was categorized as industryfunded. If the lead sponsor was not fromindustry, and NIH was either a leadsponsor or a collaborator, then thestudy was categorized as NIH funded.Otherwise, if the lead sponsor and col-laborator fields were nonmissing, thenthe study was considered to be fundedby other.

Also included in the model werephase (0, 1, 1/2, 2, 2/3, 3, 4, NA); num-ber of participants; trial specialty—cardiovascular, oncology, or mentalhealth (yes/no); trial start year; inter-vention type (procedure/device, drugor biological, behavioral, dietary supple-ment, other); and primary purpose(treatment, prevention, diagnostic,other). For the purposes of this mod-eling, studies reporting multiple inter-vention types were categorized in thefollowing hierarchy: 1, procedure/device; 2, drug/biological; 3, behav-ioral; 4, dietary supplement; 5, other.Studies missing a response to any of thedata elements used in the model wereexcluded. The model predicting trialswith DMC was also run in 2 addi-tional ways: (1) assuming that thosetrials missing a response to the ques-tion regarding DMC did in fact have aDMC, and (2) assuming that those trialsmissing a response to the question re-garding DMC did not in fact have aDMC.

When possible for all analyses, val-ues of missing methodological trialcharacteristics were inferred based onother available data. For example, forstudies reporting an interventionalmodel of single group and number of

groups as 1, the value of allocation wasdesignated as nonrandomized and thevalue of blinding was designated asopen.

SAS version 9 (SAS Institute) wasused for all statistical analyses.

RESULTSBasic characteristics of all studies reg-istered with ClinicalTrials.gov as of Sep-tember 27, 2010 (N=96 346), all inter-ventional trials registered during thesame period (n=79 413), and 2 recentsubsets of interventional trials (Octo-ber 2004–September 2007 and Octo-ber 2007–September 2010) are shownin TABLE 1. The number of trials sub-mitted for registration increased from28 881 to 40 970 during the 2 periods.A decline in the numbers of missingdata elements occurred for some char-acteristics. The rate of registered trialsnot reporting use of DMCs decreasedfrom 57.9% to 18% between the 2 timeperiods; not reporting either enroll-ment number or type (anticipated or ac-tual) decreased from 33.8% to 1.8%; notreporting randomization decreasedfrom 5.6% to 4.2%; and not reportingblinding decreased from 3.5% to 2.7%.The rate of missing data for primarypurpose increased from 4.6% to 6.8%during these periods. The proportionof trials reporting an NIH lead spon-sor decreased from 6.3% to 2.7% dur-ing the during the 2 periods, and theproportion of trials with at least 1 NorthAmerican research site decreased from61.9% to 57.5%. Other characteristicshave not changed substantially.

The proportion of trials registered be-fore beginning participant enrollmentincreased over the 2 time periods: from33% (9041/27 667) in October 2004–September 2007 to 48% (19 347/40 333) in October 2007–September2010.

The majority of clinical trials weresmall in terms of numbers of partici-pants. Overall, 96% of these trials hadan anticipated enrollment of 1000 orfewer participants and 62% had 100 orfewer participants (eFigure). The me-dian number of participants per trialwas 58 (IQR, 27-161) for completed

trials and 70 (IQR, 35-190) for trialsthat have been registered but not com-pleted.

TABLE 2 shows selected characteris-tics of all interventional trials regis-tered from October 2007 through Sep-tember 2010 (n=40 970), as well ascharacteristics for oncology, cardiovas-cular, and mental health trials com-pared with all other trials. Of these 3categories, oncology trials were mostnumerous (n=8992, 21.9%) and com-prised the largest proportion of trialslisted as currently recruiting: 31.5% vs9.3% and 10% for cardiovascular andmental health trials, respectively. On-cology trials also constituted the larg-est proportion of trials that were ac-tive but not yet recruiting (25.8% vs7.4% for cardiovascular and 7.5% formental health) and that were orientedtoward treatment (25.7% vs 8% for car-diovascular and 9.6% for mentalhealth). Among trials oriented towardprevention, cardiovascular trials com-prised the largest group: 10.4% vs 8.1%for oncology and 5.9% for mentalhealth. Cardiovascular trials also ac-counted for the largest proportion oftrials assessing medical devices: 20.2%vs 7.0% for oncology and 3.8% for men-tal health. As expected, among trials in-corporating behavioral interventions,mental health trials were most com-mon: 33.4% vs 8.1% for oncology and7.2% for cardiovascular.

Enrollment and design characteris-tics for all interventional trials regis-tered from October 2007 through Sep-tember 2010 are displayed in TABLE 3.There was heterogeneity in mediananticipated trial size according to spe-cialty. Cardiovascular trials (median an-ticipated enrollment, 100; IQR, 42-280) tended to be nearly twice as largeas oncology trials (median, 54; IQR, 30-120), with mental health trials (me-dian, 85; IQR, 40-200) residing be-tween these 2. Cardiovascular andmental health trials were more ori-ented toward later-phase research (ie,phases 3 and 4) while oncology trialsdisplayed a higher relative proportionof earlier-phase trials (ie, phases 0through 2). Trials restricted to women

CLINICAL TRIALS REGISTERED IN CLINICALTRIALS.GOV

1840 JAMA, May 2, 2012—Vol 307, No. 17 ©2012 American Medical Association. All rights reserved.

Downloaded From: http://jama.jamanetwork.com/ by a National Academy of Sciences User on 07/17/201217

Page 68: Sharing Clinical Research Data:  an IOM Workshop

Table 1. Characteristics for All Studies Registered in ClinicalTrials.gov, All Interventional Studies, and Interventional Trials From October 2004Through September 2007 and From October 2007 Through September 2010

No./Total No. (%)

All Studies(N = 96 346)

Interventional Trials

All, 2000-2010(n = 79 413)a

Oct 2004–Sept 2007(n = 28 881)

Oct 2007–Sept 2010(n = 40 970)

Primary purposeTreatment 59 200/75 778 (78.1) 59 200/75 198 (78.7) 22 242/27 565 (80.7) 28 605/38 199 (74.9)Prevention 8092/75 778 (10.7) 8092/75 198 (10.8) 3313/27 565 (12.0) 4152/38 199 (10.9)Diagnostic 2655/75 778 (3.5) 2655/75 198 (3.5) 1013/27 565 (3.7) 1489/38 199 (3.9)Otherb 5831/75 778 (7.7) 5251/75 198 (7.0) 997/27 565 (3.6) 3953/38 199 (10.3)Missing 20 568/96 346 (21.3) 4215/79 413 (5.3) 1316/28 881 (4.6) 2771/40 970 (6.8)

Intervention typec

Drug 53 441/84 614 (63.2) 52 162/79 410 (65.7) 19 851/28 881 (68.7) 24 751/40 970 (60.4)Procedural 10 911/84 614 (12.9) 9635/79 410 (12.1) 3807/28 881 (13.2) 4104/40 970 (10.0)Biological 6841/84 614 (8.1) 6657/79 410 (8.4) 2049/28 881 (7.1) 2948/40 970 (7.2)Behavioral 7134/84 614 (8.4) 6582/79 410 (8.3) 2781/28 881 (9.6) 3307/40 970 (8.1)Device 6662/84 614 (7.9) 6012/79 410 (7.6) 2092/28 881 (7.2) 3799/40 970 (9.3)Radiation 2361/84 614 (2.8) 2292/79 410 (2.9) 494/28 881 (1.7) 928/40 970 (2.3)Dietary supplement 2067/84 614 (2.4) 2036/79 410 (2.6) 330/28 881 (1.1) 1603/40 970 (3.9)Genetic 1096/84 614 (1.3) 712/79 410 (0.9) 261/28 881 (0.9) 381/40 970 (0.9)Other 8211/84 614 (9.7) 6625/79 410 (8.3) 1339/28 881 (4.6) 5110/40 970 (12.5)

Regionc

Africa 2091/86 404 (2.4) 1904/72 149 (2.6) 892/25 924 (3.4) 817/37 520 (2.2)Asia and Pacific 11 659/86 404 (13.5) 9953/72 149 (13.8) 3725/25 924 (14.4) 5768/37 520 (15.4)Central and South America 3953/86 404 (4.6) 3565/72 149 (4.9) 1240/25 924 (4.8) 1793/37 520 (4.8)Europe 23 853/86 404 (27.6) 20 337/72 149 (28.2) 8006/25 924 (30.9) 11 311/37 520 (30.1)Middle East 3587/86 404 (4.2) 2852/72 149 (4.0) 1152/25 924 (4.4) 1545/37 520 (4.1)North America 54 257/86 404 (62.8) 45 745/72 149 (63.4) 16 044/25 924 (61.9) 21 581/37 520 (57.5)Missing 9942/96 346 (10.3) 7264/79 413 (9.1) 2957/28 881 (10.2) 3450/40 970 (8.4)

Anticipated enrollment, No. of participants1-100 29 510/51 066 (57.8) 25 405/41 177 (61.7) 6415/10 611 (60.5) 17 726/28 458 (62.3)101-1000 18 252/51 066 (35.7) 13 997/41 177 (34.0) 3669/10 611 (34.6) 9629/28 458 (33.8)�1000 3304/51 066 (6.5) 1775/41 177 (4.3) 527/10 611 (5.0) 1103/28 548 (3.9)

Study registrationBefore first participant enrolled 9041/27 667 (32.7) 19 347/40 333 (48.0)After first participant enrolled 18 626/27 667 (67.3) 20 986/40 333 (52.0)

Missing enrollment number or type (anticipated or actual) 20 926/96 346 (21.7) 16 934/79 413 (21.3) 9757/28 881 (33.8) 756/40 970 (1.8)Blindingd

Open 40 520/72 475 (55.9) 40 520/72 475 (55.9) 15 571/27 880 (55.9) 22 234/39 871 (55.8)Single blind 7006/72 475 (9.7) 7006/72 475 (9.7) 2364/27 880 (8.5) 4457/39 871 (11.2)Double blind 24 949/72 475 (34.4) 24 949/72 475 (34.4) 9945/27 880 (35.7) 13 180/39 871 (33.1)Missing 23 871/96 346 (24.8) 6983/79 413 (8.7) 1001/28 881 (3.5) 1099/40 970 (2.7)

Allocation statusRandomizedd 49 762/71 046 (70.0) 49 762/70 953 (70.13) 19 218/27 265 (70.5) 27 027/39 240 (68.9)Nonrandomizedd 21 191/71 046 (29.83) 21 191/70 953 (29.87) 8047/27 265 (29.5) 12 213/39 240 (31.1)Missing 25 300/93 346 (26.3) 8460/79 413 (10.7) 1616/28 881 (5.6) 1730/40 970 (4.2)

DMCStudy has DMC 22 572/56 856 (39.7) 19 886/46 699 (42.6) 5711/12 153 (47.0) 13 644/33 615 (40.6)Study does not have DMC 34 284/56 856 (60.3) 26 813/46 699 (57.4) 6442/12 153 (53.0) 19 971/33 615 (59.4)Missing 39 490/96 346 (41.0) 32 714/79 413 (41.2) 16 728/28 881 (57.9) 7355/40 970 (18.0)

Lead sponsorIndustry 31 173/96 346 (32.4) 28 264/79 413 (35.6) 10 783/28 881 (37.3) 15 248/40 970 (37.2)NIH 9215/96 346 (9.6) 5878/79 413 (7.4) 1825/28 881 (6.3) 1106/40 970 (2.7)US federal 1715/96 346 (1.8) 1473/79 413 (1.9) 644/28 881 (2.2) 547/40 970 (1.3)Other 54 243/96 346 (56.3) 43 798/79 413 (55.2) 15 629/28 881 (54.1) 24 069/40 970 (58.7)

Abbreviations: DMC data monitoring committee; NIH, National Institutes of Health.a Includes 9562 interventional trials registered before October 2004.b Includes supportive care, screening, health services research, and basic science.cPercentages may not sum to 100% as categories are not mutually exclusive.dOnly collected for interventional studies.

CLINICAL TRIALS REGISTERED IN CLINICALTRIALS.GOV

©2012 American Medical Association. All rights reserved. JAMA, May 2, 2012—Vol 307, No. 17 1841

Downloaded From: http://jama.jamanetwork.com/ by a National Academy of Sciences User on 07/17/201218

Page 69: Sharing Clinical Research Data:  an IOM Workshop

were almost twice as common as trialsrestricted to men (9.1% vs 5.4%), a dif-ference driven largely by oncology trials(13.8% exclude men, compared with2.0% [cardiovascular] and 5.8% [men-tal health]).

Therewerealsodifferences inagedis-tributionamongtherapeuticareas.Men-tal health trials were most likely to per-mitinclusionofchildren(17.9%vs11.3%foroncologyand10.5%forcardiovascu-lar) but were also most likely to excludeelderlyparticipants:56%ofmentalhealthtrialsexcludedparticipantsolder than65years compared with 8.1% for oncologyand 13.3% for cardiovascular.

Geographical differences were alsoapparent. Cardiovascular trials showedthe smallest proportion of studies withat least 1 North American research site(47.9%, vs 65.1% for oncology and69.1% for mental health) and the mostsubstantial proportion of trials with atleast 1 European site (39.9% vs 27.6%and 20.9%, respectively).

Differences in trial design were alsoevident among therapeutic areas(Table 3). Oncology trials were morelikely to involve a single group of par-ticipants with no randomization of treat-ment assignment (64.7% vs 26.2% forcardiovascular and 20.8% for mental

health), and the majority of oncologytrials (87.6%) were not blinded. Men-tal health trials, on the other hand, weremore likely to be blinded (60.0%, vs12.4% for oncology and 49.0% for car-diovascular), to use parallel-group de-sign (65.9% vs 32.5% for oncology and63.2% for cardiovascular), and to userandomization (80.1%, vs 36.3% for on-cology and 73.7% for cardiovascular).

Data on funding source and numberof sites were available for 37 520 of40 970 clinical trials registered during the2007-2010 period (eTable 2). The larg-est proportion of these trials were notfunded by industry or the NIH (47%,

Table 2. Clinical Trial Attributes by Therapeutic Area for All Interventional Trials, October 2007–September 2010

No./Total No. (%)

All Trials(n = 40 970)

Oncology(n = 8992)

Cardiovascular(n = 3437)

Mental Health(n = 3695)

Overall statusNot yet recruiting 3725/40 970 (9.1) 744/8992 (8.3) 362/3437 (10.5) 366/3695 (9.9)

Recruiting 17 348/40 970 (42.3) 5456/8992 (60.7) 1609/3437 (46.8) 1727/3695 (46.7)

Completed 12 265/40 970 (29.9) 961/8992 (10.7) 855/3437 (24.9) 1005/3695 (27.2)

Suspended 254/40 970 (0.6) 101/8992 (1.1) 16/3437 (0.5) 11/3695 (0.3)

Terminated 1180/40 970 (2.9) 265/8992 (2.9) 91/3437 (2.6) 111/3695 (3.0)

Withdrawn 307/40 970 (0.7) 84/8992 (0.9) 18/3437 (0.5) 22/3695 (0.6)

Active, not recruiting 4985/40 970 (12.2) 1286/8992 (14.3) 371/3437 (10.8) 372/3695 (10.1)

Enrolling by invitation 906/40 970 (2.2) 95/8992 (1.1) 115/3437 (3.3) 81/3695 (2.2)

Primary completion typea

Actual 12 428/37 912 (32.8) 1194/8373 (14.3) 881/3218 (27.4) 992/3435 (28.9)

Anticipated 25 484/37 912 (67.2) 7179/8373 (85.7) 2337/3218 (72.6) 2443/3435 (71.1)

Missing 3058/40 970 (7.5) 619/8992 (6.9) 219/3437 (6.4) 260/3695 (7.0)

Primary purposeTreatment 28 605/38 199 (74.9) 7353/8744 (84.1) 2284/3285 (69.5) 2750/3489 (78.8)

Prevention 4152/38 199 (10.9) 338/8744 (3.9) 431/3285 (13.1) 247/3489 (7.1)

Diagnosis 1489/38 199 (3.9) 470/8744 (5.4) 244/3285 (7.4) 71/3489 (2.0)

Supportive care 1290/38 199 (3.4) 377/8744 (4.3) 75/3285 (2.3) 119/3489 (3.4)

Screening 195/38 199 (0.5) 63/8744 (0.7) 18/3285 (0.5) 21/3489 (0.6)

Health services research 733/38 199 (1.9) 64/8744 (0.7) 73/3285 (2.2) 107/3489 (3.1)

Basic science 1735/38 199 (4.5) 79/8744 (0.9) 160/3285 (4.9) 174/3489 (5.0)

Missing 2771/40 970 (6.8) 248/8992 (2.8) 152/3437 (4.4) 206/3695 (5.6)

Intervention typeb

Drug 24 751/40 970 (60.4) 6485/8992 (72.1) 1629/3437 (47.4) 2172/3695 (58.8)

Procedure 4104/40 970 (10.0) 1420/8992 (15.8) 449/3437 (13.1) 124/3695 (3.4)

Biological 2948/40 970 (7.2) 1109/8992 (12.3) 86/3437 (2.5) 36/3695 (1.0)

Behavioral 3307/40 970 (8.1) 269/8992 (3.0) 238/3437 (6.9) 1103/3695 (29.9)

Device 3799/40 970 (9.3) 267/8992 (3.0) 766/3437 (22.3) 145/3695 (3.9)

Radiation 928/40 970 (2.3) 811/8992 (9.0) 20/3437 (0.6) 17/3695 (0.5)

Dietary supplement 1603/40 970 (3.9) 171/8992 (1.9) 141/3437 (4.1) 99/3695 (2.7)

Genetic 381/40 970 (0.9) 313/8992 (3.5) 13/3437 (0.4) 8/3695 (0.2)

Other 5110/40 970 (12.5) 1369/8992 (15.2) 439/3437 (12.8) 434/3695 (11.7)aAllowable values for this field are actual and anticipated. “Primary completion date” denotes the date that the final research participant was examined or received an intervention for the

purposes of final collection of data for the primary outcome, whether the clinical trial concluded according to the prespecified protocol or was terminated.bPercentages may not sum to 100% as categories are not mutually exclusive.

CLINICAL TRIALS REGISTERED IN CLINICALTRIALS.GOV

1842 JAMA, May 2, 2012—Vol 307, No. 17 ©2012 American Medical Association. All rights reserved.

Downloaded From: http://jama.jamanetwork.com/ by a National Academy of Sciences User on 07/17/201219

Page 70: Sharing Clinical Research Data:  an IOM Workshop

Table 3. Trial Characteristics and Summary of Designs for All Interventional Trials, Registered October 2007–September 2010

No./Total No. (%)

All Clinical Trials(n = 40 970)

Oncology(n = 8992)

Cardiovascular(n = 3437)

Mental Health(n = 3695)

Actual enrollment, median (IQR),No. of participants

58.0 (27.0-161.0) 43.0 (18.0-100.0) 73.5 (34.0-200.0) 61.0 (29.0-216.5)

1-100 7566/11 671 (64.8) 853/1139 (74.9) 475/807 (58.9) 584/968 (60.3)101-1000 3744/11 671 (32.1) 259/1139 (22.7) 291/807 (36.1) 359/968 (37.1)�1000 361/11 671 (3.1) 27/1139 (2.4) 41/807 (5.1) 25/968 (2.6)

Anticipated enrollment, median (IQR),No. of participantsa

70.0 (35.0-190.0) 54.0 (30.0-120.0) 100.0 (42.0-280.0) 85.0 (40.0-200.0)

1-100 17 726/28 467 (62.3) 5597/7728 (72.4) 1331/2569 (51.8) 1494/2675 (55.8)101-1000 9629/28 467 (33.8) 1937/7728 (25.1) 1041/2569 (40.5) 1086/2675 (40.6)�1000 1103/28 467 (3.9) 194/7728 (2.5) 197/2569 (7.7) 95/2675 (3.6)

Sex, %Female only 3730/40 970 (9.1) 1240/8992 (13.8) 68/3437 (2.0) 215/3695 (5.8)Male only 2228/40 970 (5.4) 542/8992 (6.0) 120/3437 (3.5) 170/3695 (4.6)Both 35 012/40 970 (85.5) 7210/8992 (80.2) 3249/3437 (94.5) 3310/3695 (89.6)

Includes children (�18 y) 7113/40 970 (17.4) 1015/8992 (11.3) 362/3437 (10.5) 663/3695 (17.9)Excludes elderly (�65 y) 13 047/40 970 (31.8) 727/8992 (8.1) 458/3437 (13.3) 2071/3695 (56.0)Regional distributionb,c

Africa 817/37 520 (2.2) 76/8358 (0.9) 53/3164 (1.7) 43/3384 (1.3)Asia and Pacific 5768/37 520 (15.4) 1347/8358 (16.1) 538/3164 (17.0) 361/3384 (10.7)Central and South America 1793/37 520 (4.8) 211/8358 (2.5) 172/3164 (5.4) 111/3384 (3.3)Europe 11 311/37 520 (30.1) 2308/8358 (27.6) 1261/3164 (39.9) 707/3384 (20.9)Middle East 1545/37 520 (4.1) 204/8358 (2.4) 134/3164 (4.2) 127/3384 (3.8)North America 21 581/37 520 (57.5) 5445/8358 (65.1) 1516/3164 (47.9) 2338/3384 (69.1)

Interventional groupSingle group 12 144/38 969 (31.2) 4822/7451 (64.7) 890/3394 (26.2) 754/3620 (20.8)Parallel 21 782/38 969 (55.9) 2422/7451 (32.5) 2145/3394 (63.2) 2386/3620 (65.9)Crossover 4351/38 969 (11.2) 140/7451 (1.9) 295/3394 (8.7) 353/3620 (9.8)Factorial 692/38 969 (1.8) 67/7451 (0.9) 64/3394 (1.9) 127/3620 (3.5)Missing 2001/40 970 (4.9) 1541/8992 (17.1) 43/3437 (1.3) 75/3695 (2.0)

BlindingOpen 22 234/39 871 (55.8) 7342/8386 (87.6) 1731/3394 (51.0) 1451/3629 (40.0)Single blind 4457/39 871 (11.2) 288/8386 (3.4) 484/3394 (14.3) 538/3629 (14.8)Double blind 13 180/39 871 (33.1) 756/8386 (9.0) 1179/3394 (34.7) 1640/3629 (45.2)Missing 1099/40 970 (2.7) 606/8992 (6.7) 43/3437 (1.3) 66/3695 (1.8)

AllocationRandomized 27 027/39 240 (68.9) 2919/8035 (36.3) 2481/3366 (73.7) 2893/3612 (80.1)Nonrandomized 12 213/39 240 (31.1) 5116/8035 (63.7) 885/3366 (26.3) 719/3612 (19.9)Missing 1730/40 970 (4.2) 957/8992 (10.6) 71/3437 (2.1) 83/3695 (2.2)

Phase0 316/40 970 (0.8) 69/8992 (0.8) 29/3437 (0.8) 47/3695 (1.3)1 6222/40 970 (15.2) 1884/8992 (21.0) 210/3437 (6.1) 391/3695 (10.6)1/2 2105/40 970 (5.1) 950/8992 (10.6) 109/3437 (3.2) 124/3695 (3.4)2 8484/40 970 (20.7) 3436/8992 (38.2) 471/3437 (13.7) 678/3695 (18.3)2/3 1055/40 970 (2.6) 124/8992 (1.4) 113/3437 (3.3) 118/3695 (3.2)3 6197/40 970 (15.1) 1025/8992 (11.4) 531/3437 (15.4) 559/3695 (15.1)4 5559/40 970 (13.6) 234/8992 (2.6) 761/3437 (22.1) 584/3695 (15.8)NA 11 032/40 970 (26.9) 1270/8992 (14.1) 1213/3437 (35.3) 1194/3695 (32.3)

Study has DMCd 13 644/33 615 (40.6) 3322/6316 (52.6) 1578/3125 (50.5) 1334/3189 (41.8)Abbreviations: DMC, data monitoring committee; IQR, interquartile range; NA, not applicable.aDenominators represent number of trials with information available on enrollment. Out of 40 970 trials, 756 (1.8%) were missing information on enrollment or on whether it was actual or

anticipated; 107/8992 (1.2%) oncology trials, 30/3437 (1.7%) cardiovascular trials, and 43/3695 (1.2%) mental health trials were missing information on enrollment or on whether it wasactual or anticipated.

bDenominators represent number of trials with information available on region. Out of 40 970 trials, 3450 (8.4%) were missing region information; 634/8992 (7.1%) of oncology trials,273/3437 (7.9%) of cardiovascular trials, and 311/3695 (8.4%) mental health trials were missing region information.

cPercentages may not sum to 100% as categories are not mutually exclusive.dDenominators represent number of clinical trials that reported information on DMCs. Out of 40 970 trials, 7355 (18.0%) were missing information on DMC status; 2676/8992 (29.8%)

oncology trials, 312/3437 (9.1%) cardiovascular trials, and 506/3695 (13.7%) mental health trials were missing DMC information.

CLINICAL TRIALS REGISTERED IN CLINICALTRIALS.GOV

©2012 American Medical Association. All rights reserved. JAMA, May 2, 2012—Vol 307, No. 17 1843

Downloaded From: http://jama.jamanetwork.com/ by a National Academy of Sciences User on 07/17/201220

Page 71: Sharing Clinical Research Data:  an IOM Workshop

n=17 592) with 16 674 (44%) funded byindustry, 3254 (9%) funded by the NIH,and 757 (2.0%) funded by other USfederal agencies. The majority of trialswere single site (24 788, 66%); 12 732(34%) were multisite trials. The largestproportion of trials (39%, 14 637/37 520) comprised single-site trials thatwere not funded by the NIH or by in-dustry (see eTable 2; note: this ex-cluded3450 trials [8%]withmissingdataon facility location). These single-sitetrials not funded by the NIH or indus-try were typically small: approximately70% had enrolled or planned to enrollfewer than 100 participants. They werecharacterized primarily by North Ameri-can (46.1%) and European (30.1%) rep-resentation. Industry-funded multi-center trials included Asian and Pacificsites in 27% of trials and European sitesin 41.2% of trials, with 33.4% of trials notinvolving any North American sites.

Regression analyses comparing trialcharacteristics as they relate to use ofDMCs, blinding, and randomization aredisplayed in TABLE 4. Compared withtrials in which industry was the leadsponsor, other types of lead sponsorswere more likely to report use of DMCswith DMCs most common among NIH-sponsored trials (adjusted OR, 9.09;95% CI, 7.28-11.34). Reported use ofDMCs was less common in industry-sponsored vs NIH-sponsored trials (ad-justed OR, 0.11; 95% CI, 0.09-0.14).Relative to phase 3 trials, earlier- andlater-phase trials were less likely to re-port use of DMCs (adjusted OR, 0.83;95% CI, 0.76-0.91 [earlier phase]; ad-justed OR, 0.52; 95% CI, 0.47-0.58[later phase]). Compared with cardio-vascular and oncology trials, mentalhealth trials were less likely to reportuse of DMCs. When compared withtrials evaluating drugs or biologics, trials

of behavioral interventions were lesslikely to report use of DMCs.

There were small differences in re-porting of blinding or randomization bydifferent lead sponsor organizations. Forexample, trials in which a US federalagency (excluding the NIH) or anothersponsor was the lead sponsor were lesslikely to report use of blinding (ad-justed OR, 0.65; 95% CI, 0.51-0.83; andadjusted OR, 0.90; 95% CI, 0.84-0.96, re-spectively). Relative to phase 3 trials, ear-lier- and later-phase trials were also lesslikely to report use of blinding (ad-justed OR, 0.66; 95% CI, 0.60-0.72 [ear-lier phase]; adjusted OR, 0.50; 95% CI,0.45-0.55 [later phase]) and randomiza-tion (adjusted OR, 0.28; 95% CI, 0.25-0.31[earlier phase]; adjusted OR, 0.37;95% CI, 0.33-0.42 [later phase]). On-cology trials were less likely to use ran-domization (adjusted OR, 0.20; 95% CI,0.19-0.22) and blinding (adjusted OR,

Table 4. Regression Analyses of Interventional Trials Registered in ClinicalTrials.gov, October 2007–September 2010, and the Reported Use ofDMC, Blinding, and Randomization

Variable

Adjusted OR (95% CI)

DMCP

Value BlindingP

Value RandomizationP

Value

Sponsoring agency (vs industry)NIH 9.087 (7.279-11.343) �.001 1.055 (0.880-1.265) .02 1.337 (1.061-1.685) .02

Other 2.984 (2.773-3.211) .07 0.895 (0.836-0.959) .80 1.031 (0.959-1.109) .39

US federal 2.174 (1.697-2.784) .01 0.653 (0.512-0.831) �.001 0.975 (0.727-1.308) .38

Study phase (vs phase 3)NA 0.503 (0.452-0.559) �.001 0.539 (0.487-0.598) �.001 0.365 (0.322-0.413) .71

0 0.623 (0.440-0.881) .25 0.622 (0.432-0.895) .77 0.200 (0.136-0.295) �.001

1 0.685 (0.607-0.772) .11 0.382 (0.341-0.428) �.001 0.162 (0.143-0.183) �.001

1/2 0.961 (0.830-1.113) �.001 0.516 (0.442-0.601) �.001 0.185 (0.159-0.216) �.001

2 0.868 (0.786-0.958) �.001 0.843 (0.767-0.926) �.001 0.362 (0.325-0.404) .80

2/3 0.974 (0.808-1.174) �.001 1.172 (0.968-1.419) �.001 0.917 (0.712-1.179) �.001

4 0.525 (0.470-0.588) �.001 0.502 (0.453-0.555) �.001 0.375 (0.331-0.425) .35

Study size (per 1000 additional participants) 1.030 (1.009-1.051) .004 1.005 (0.993-1.018) .40 1.022 (1.000-1.045) .05

Cardiovascular (yes vs no) 1.745 (1.583-1.923) �.001 0.966 (0.881-1.060) .47 0.970 (0.869-1.081) .58

Oncology (yes vs no) 1.591 (1.470-1.723) �.001 0.102 (0.093-0.112) �.001 0.202 (0.187-0.218) �.001

Mental health (yes vs no) 1.079 (0.976-1.194) .14 1.428 (1.299-1.569) �.001 1.026 (0.915-1.151) .66

Intervention type (vs drug or biological)Behavioral 0.691 (0.611-0.782) �.001 0.629 (0.558-0.710) �.001 3.216 (2.691-3.844) �.001

Dietary supplement 0.901 (0.763-1.064) .88 2.945 (2.449-3.541) �.001 2.651 (2.097-3.352) �.001

Other 0.900 (0.797-1.016) .86 0.669 (0.591-0.758) �.001 1.078 (0.944-1.231) �.001

Procedure/device 1.009 (0.926-1.100) �.001 0.513 (0.471-0.558) �.001 0.756 (0.692-0.825) �.001

Start year (increment of 1 y) 1.063 (1.049-1.078) �.001 1.023 (1.012-1.035) �.001 1.005 (0.993-1.018) .41

Primary category (vs treatment)Diagnostic 0.639 (0.542-0.754) �.001 0.569 (0.477-0.679) �.001 0.229 (0.193-0.270) �.001

Other 0.737 (0.659-0.823) �.001 1.112 (1.000-1.237) �.001 1.013 (0.900-1.140) �.001

Prevention 1.172 (1.058-1.299) �.001 1.151 (1.045-1.268) �.001 1.449 (1.282-1.638) �.001Abbreviations: DMC, data monitoring committee; NA, not applicable; NIH, National Institutes of Health; OR, odds ratio.

CLINICAL TRIALS REGISTERED IN CLINICALTRIALS.GOV

1844 JAMA, May 2, 2012—Vol 307, No. 17 ©2012 American Medical Association. All rights reserved.

Downloaded From: http://jama.jamanetwork.com/ by a National Academy of Sciences User on 07/17/201221

Page 72: Sharing Clinical Research Data:  an IOM Workshop

0.10; 95% CI, 0.09-0.11) while mentalhealth trials were not more likely to userandomization (adjusted OR, 1.03; 95%CI, 0.92-1.15) but were more likely touse blinding (adjusted OR, 1.43; 95% CI,1.30-1.57). When compared with trialsevaluating drugs or biologics, trials of be-havioral interventions were less likely toreportuseofblinding(adjustedOR,0.63;95% CI, 0.56-0.71) but more likely to userandomization (adjusted OR, 3.22; 95%CI, 2.69-3.84). Trials of dietary supple-ments were more likely to use blinding(adjusted OR, 2.95; 95% CI, 2.45-3.54)and randomization (adjusted OR, 2.65;95% CI, 2.10-3.35) and procedure anddevice trials were less likely to use blind-ing (adjusted OR, 0.51; 95% CI, 0.47-0.56) and randomization (adjusted OR,0.76; 95% CI, 0.69-0.83).

More recent trials (reference: per1-year increment) were more likely toreport use of DMCs (adjusted OR, 1.06;95% CI, 1.05-1.08) and blinding (ad-justed OR, 1.02; 95% CI, 1.01-1.04) butno more likely to use randomization(adjusted OR, 1.01; 95% CI, 0.99-1.02). Larger trials were more likely toreport use of DMCs (adjusted OR, 1.03;95% CI, 1.01-1.05) and randomiza-tion (adjusted OR, 1.02; 95% CI, 1.00-1.05). Diagnostic trials were less likelyto report use of all 3 methods (DMCs:adjusted OR, 0.64; 95% CI, 0.54-0.75;blinding: adjusted OR, 0.57; 95% CI,0.48-0.68; randomization: adjusted OR,0.23; 95% CI, 0.19-0.27) while preven-tion trials were more likely to use all 3compared with treatment trials (DMCs:adjusted OR, 1.17; 95% CI, 1.06-1.30;blinding: adjusted OR, 1.15; 95% CI,1.05-1.27; randomization: adjusted OR,1.45; 95% CI, 1.28-1.64). Finally, ananalysis of blinding using only random-ized trials produced results similar tothe blinding analysis using all inter-ventional trials, and oncology trials inparticular were less likely to report useof blinding in the context of a random-ized design (�2=933; P� .001).

COMMENTClinical studies registered in theClinicalTrials.gov database are domi-nated by small, single-center trials, many

of which are not funded by the NIH orindustry. Many registered trials containsignificantheterogeneity inmethodologi-cal approaches, including reported useof randomization, blinding, and DMCs.Although ClinicalTrials.gov has a num-ber of limitations, it is the largest aggre-gate resource for informing policy analy-sis about the US clinical trials enterprise.We anticipate that the “sunshine” on thenational clinical trials portfolio broughtabout by ClinicalTrials.gov, coupled withthe greater ease of obtaining an analysisdata set from the database for AACT,19

will engendermuch-neededdebateaboutclinical trial methodologies and fund-ing allocation.

Many of the differences noted in thepresent study have been identified be-fore and likely represent variation in ap-propriate approaches for particular dis-eases. Reviews of samples from theliterature in 198023 and 200024 raisedsimilar questions, for which this re-port provides a contemporary and morecomprehensive sample. Despite con-cerns previously articulated by Mein-ert et al23 and Chan and Altman24—concerns that included a relatively highprevalence of clinical trials with inad-equate sample sizes and insufficientlydescribed methodologies—disparitiesstill remain across specialties. This inturn raises questions about why suchheterogeneity persists, whether theportfolio documented by this analysissuffices to address gaps in evidence, andthe reasons underlying differences intrial methodology. It is particularly im-portant to identify cases in which suchmethodological differences lack ad-equate scientific justification, as theymay present an opportunity for im-proving the public health through ad-justments to research investment strat-egies and methods.

Implications for Policyand Strategy

The fact that 50% of interventional stud-ies registered from October 2007 to Sep-tember 2010 by design include fewerthan 70 participants may have impor-tant policy implications. Small trialsmay be appropriate in many cases (eg,

earlier-phase drug evaluations, or in-vestigations of biological or behav-ioral mechanisms, rather than clinicaloutcomes). Particularly in oncology,there is a growing sense that small trialsbased on genetics or biomarkers canyield definitive results.25 However, smalltrials are unlikely to be informative inmany other settings, such as establish-ing the effectiveness of treatments withmodest effects and comparing effec-tive treatments to enable better deci-sions in practice.26-28 Preliminary ob-servations suggest that many smallclinical trials were designed to enrollmore participants, raising questionsabout their ultimate power (D. A. Za-rin, MD, written communication,March 28, 2012), but an accurate de-piction of these issues requires a morein-depth analysis. These findings raiseimportant issues that should be ad-dressed by detai led, special ty-oriented assessments of the utility of thelarge number of small trials.

A comprehensive collection of allclinical trials on a global basis wouldenable the most effective examinationof evidence to support medical deci-sions. The effect of the globalization ofclinical research has been debated,29-32

and emerging evidence of differentialregional involvement as a function oftherapeutic area also raises questionsrelevant to policy and strategy. Al-though the World Health Organiza-tion (WHO) provides a portal for manytrial registries from around the world,unacknowledged duplicate entriesmake it difficult to determine a uniquelist of clinical trials; in addition, theoverall data set is not available for elec-tronic download, rendering the data un-available for aggregate analysis.33

Attention to standards for nondrug in-terventions (eg, biologics, devices, andprocedures) as well as study designwould also enhance the ability to de-scribe and understand the clinical trialsenterprise.34 Indeed, as Devereaux andcolleagues35 point out, concepts as fun-damental as blinding are shrouded in ter-minological confusion and ambiguity.Furthermore, lack of clarity surround-ing the naming of devices and biolog-

CLINICAL TRIALS REGISTERED IN CLINICALTRIALS.GOV

©2012 American Medical Association. All rights reserved. JAMA, May 2, 2012—Vol 307, No. 17 1845

Downloaded From: http://jama.jamanetwork.com/ by a National Academy of Sciences User on 07/17/201222

Page 73: Sharing Clinical Research Data:  an IOM Workshop

ics makes examination of specific medi-cal technologies difficult.

Although the industry is the leadsponsor in only about 36% of interven-tional trials in this study, these ac-counted for 59% of all trial partici-pants. Further analysis of trials in eachspecialty may help elucidate this com-plex mix of funding, trial size, and lo-cation so that policies might be en-acted to improve the responsiveness oftrials to the needs of public health andthe overall research community.

Methodological differences acrosstherapeutic areas are also of interest.The greater focus on earlier-phase trialsand biomarker-based personalizedmedicine25 may explain some of the dif-ferences in approach evident with on-cology trials, but substantial differ-ences in the use of randomization andblinding across specialties persist af-ter adjustment for phase, raising fun-damental questions about the ability todraw reliable inferences from clinicalresearch conducted in that arena.

The reporting of use of a DMC is anoptional data element within theClinicalTrials.gov registry. The appro-priate criteria for determining when aDMC is useful or required remain con-troversial. Yet the heterogeneity ob-served by trial phase, disease cat-egory, and lead sponsor category in thisstudy (eg, industry vs governmentsponsorship) may represent an oppor-tunity for mutual learning and com-promise among disparate views. Thetrend toward increased reporting of useof DMCs over time in this study is no-table, but clear policies would be use-ful to those researchers designing trials.For example, many different arrange-ments can be made for monitoringsafety in clinical trials, and the cur-rent data only reflect the presence of atypical, well-defined DMC.

Limitations

Several limitationsofourstudyshouldbenoted. First, ClinicalTrials.gov does notincludeallclinicaltrials.WithintheUnitedStates, legal requirements forregistrationdonot includephase1trials, trialsnot in-volving a drug or device, and trials not

under US jurisdiction. Also, althoughmany trialists from other countries useClinicalTrials.gov to satisfy ICMJE reg-istration requirements,7 other registriesaroundtheworldmaybeused.10However,ClinicalTrials.govstill accounts formorethan80%ofallclinicalstudiesintheWHOportal, as based on comparisons of thenumber of clinical studies appearing intheClinicalTrials.govregistrydividedbythe number of unique studies appearingin the WHO portal.

Second, there have been changes overtime in the data collected, the defini-tions used, and the rigor with whichmissing data are pursued. As de-scribed in the “Methods” section, somedata elements were either missing orunavailable because of practical or lo-gistical limitations. Some of these is-sues can be addressed by focused analy-ses in which ancillary data sets arecreated or review of primary protocolsand studies is done. In addition, the po-tential for serious sanctions for incom-plete data under the FDAAA may haveimproved data collection for those fieldsin recent years. As noted earlier, weused the study type field from theClinicalTrials.gov registry to identify in-terventional studies; however, we didnot perform additional manual screen-ing to identify and exclude possibly mis-classified observational studies.

Third, the need for a standard ontol-ogy todescribeclinical researchremainsa pressing concern. Current definitionswere developed to help individuals findparticular trialsorwere legallymandatedwithoutnecessarily involvingexpertsorallowing time for testing.Consequently,some data remain ambiguous, compli-catingefforts tocombineandanalyze re-sults inagiventherapeuticareaoracrossareas. For example, the terms interven-tional trial and clinical trial are criticalfor distinguishing purely observationalstudies from those that assign partici-pants to an interventional therapy. Fur-ther refinement of this definition9 couldbe helpful to those interested in differ-entiatinghigh-riskinvasiveinterventionsfrom low-risk interventions or distin-guishing specific types of behavioral,drug, or device interventions.

CONCLUSIONSThe clinical trials enterprise as revealedby the contents of ClinicalTrials.gov isdominated by small clinical trials andcontains significant heterogeneity inmethodological approaches, includ-ing the use of randomization, blind-ing, and DMCs. Our analysis raisesquestions about the best methods forgenerating evidence, as well as thecapacity of the clinical trials enterpriseto supply sufficient amounts of high-quality evidence needed to ensure con-fidence in guideline recommenda-tions. Given the deficit in evidence tosupport key decisions in clinical prac-tice guidelines11,12 as well as concernsabout insufficient numbers of volun-teers for trials,36 the desire to providehigh-quality evidence for medical deci-sions must include consideration of acomprehensive redesign of the clinicaltrial enterprise.

Author Contributions: Dr Califf had full access to allof the data in the study and takes responsibility forthe integrity of the data and the accuracy of the dataanalysis.Study concept and design: Califf, Zarin, Sherman,Tasneem.Acquisition of data: Zarin, Tasneem.Analysis and interpretation of data: Califf, Zarin,Kramer, Aberle, Tasneem.Drafting of the manuscript: Califf, Sherman.Critical revision of the manuscript for important in-tellectual content: Califf, Zarin, Kramer, Aberle,Tasneem.Statistical analysis: Califf, Aberle.Obtained funding: Califf, Kramer.Administrative, technical, or material support: Califf,Zarin, Tasneem.Study supervision: Califf, Zarin, Sherman.Conflict of Interest Disclosures: All authors have com-pleted and submitted the ICMJE Form for Disclosureof Potential Conflicts of Interest. Dr Califf reports re-ceiving research grants that partially support his sal-ary from Amylin, Johnson & Johnson (Scios), Merck,Novartis Pharma, Schering Plough, Bristol-MyersSquibb Foundation, Aterovax, Bayer, Roche, and Lilly;all grants are paid to Duke University. Dr Califf alsoconsults for TheHeart.org, Johnson & Johnson (Scios),Kowa Research Institute, Nile, Parkview, OrexigenTherapeutics, Pozen, Servier International, WebMD,Bristol-Myers Squibb Foundation, AstraZeneca, Bayer-OrthoMcNeil, BMS, Boerhinger Ingelheim, Daiichi San-kyo, GlaxoSmithKline, Li Ka Shing Knowledge Insti-tute, Medtronic, Merck, Novartis, sanofi-aventis,XOMA, and University of Florida; all income from theseconsultancies is donated to nonprofit organizations,with the majority going to the clinical research fel-lowship fund of the Duke Clinical Research Institute.Dr Califf holds equity in Nitrox LLC. Dr Kramer is theexecutive director of the Clinical Trials Transforma-tion Institute (CTTI), a public-private partnership. Aportion of Dr Kramer’s salary is supported by pooledfunds from CTTI members (https://www.ctti-clinicaltrials.org/about/membership). Dr Kramer re-ports receiving a research grant from Pfizer that sup-ports a small percentage of her salary; this grant is paid

CLINICAL TRIALS REGISTERED IN CLINICALTRIALS.GOV

1846 JAMA, May 2, 2012—Vol 307, No. 17 ©2012 American Medical Association. All rights reserved.

Downloaded From: http://jama.jamanetwork.com/ by a National Academy of Sciences User on 07/17/201223

Page 74: Sharing Clinical Research Data:  an IOM Workshop

to Duke University. Dr Kramer also served on an ad-visory board for the “Pharmacovigilance Center of Ex-cellence” at GlaxoSmithKline, for which she receivedan honorarium. Financial disclosure information for DrsCaliff and Kramer is also publicly available at https://www.dcri.org/about-us/conflict-of-interest. No otherdisclosures were reported.Funding/Support: Financial support for this project wasprovided by grant U19FD003800 from the US Foodand Drug Administration awarded to Duke Univer-sity for the Clinical Trials Transformation Initiative.Role of the Sponsors: The US Food and Drug Admin-istration participated in the design and conduct of thestudy via one of the coauthors (R.E.S.); in the collec-tion, management, analysis, and interpretation of thedata; and in the preparation, review, and approval ofthe manuscript.Online-Only Material: The 2 eTables, 2 eAppen-dixes, and eFigure are available at http://www.jama.com.Additional Contributions: We gratefully acknowl-edge the contributions of CTTI Project Leader JeanBolte, RN (Duke Clinical Research Institute), and Na-tional Library of Medicine staff members Nicholas Ide,MS; Rebecca Williams, PhD; and Tony Tse, PhD, tothis project. We also thank Jonathan McCall, BA (DukeClinical Research Institute), for editorial assistance withthe manuscript. None received compensation for theircontributions besides their salaries.

REFERENCES

1. Fair tests of treatments in health care. The JamesLind Library. http://www.jameslindlibrary.org. Ac-cessed November 28, 2011.2. DeMets DL, Califf RM. A historical perspective onclinical trials innovation and leadership: where havethe academics gone? JAMA. 2011;305(7):713-714.3. Sung NS, Crowley WF Jr, Genel M, et al. Centralchallenges facing the national clinical researchenterprise. JAMA. 2003;289(10):1278-1287.4. Menikoff J, Richards EP. What the Doctor Didn’tSay: The Hidden Truth About Medical Research. NewYork, NY: Oxford University Press; 2006.5. Food and Drug Administration ModernizationAct of 1997 (FDAMA): Public Law 105-15. USFood and Drug Administration. http://www.fda.gov/RegulatoryInformation/Legislation/Federa lFoodDrugandCosmet icActFDCAct/SignificantAmendmentstotheFDCAct/FDAMA/FullTextofFDAMAlaw/default.htm. Accessed January3, 2012.6. DeAngelis CD, Drazen JM, Frizelle FA, et al; Inter-national Committee of Medical Journal Editors. Clini-cal trial registration: a statement from the Interna-tional Committee of Medical Journal Editors. JAMA.2004;292(11):1363-1364.7. Uniform requirements for manuscripts submittedto biomedical journals: obligation to register clinical

trials. International Committee of Medical Journal Edi-tors. http://www.icmje.org/publishing_10register.html. Accessed January 3, 2012.8. Food and Drug Administration Amendments Act of2007: Public Law 110-85. US Food and Drug Admin-istration. http://www.fda.gov/RegulatoryInformation/Legislation/FederalFoodDrugandCosmeticActFDCAct/SignificantAmendmentstotheFDCAct/FoodandDrugAdministrationAmendmentsActof2007/FullTextofFDAAALaw/default.htm. Accessed May 31,2011.9. ClinicalTrials.gov protocol data element defini-tions (draft): August 2011. ClinicalTrials.gov. http://prsinfo.clinicaltrials.gov/definitions.html. Ac-cessed November 4, 2011.10. Zarin DA, Tse T, Williams RJ, Califf RM, Ide NC.The ClinicalTrials.gov results database: update and keyissues. N Engl J Med. 2011;364(9):852-860.11. Lee DH, Vielemeyer O. Analysis of overall levelof evidence behind Infectious Diseases Society ofAmerica practice guidelines. Arch Intern Med. 2011;171(1):18-22.12. Tricoci P, Allen JM, Kramer JM, Califf RM, SmithSC Jr. Scientific evidence underlying the ACC/AHAclinical practice guidelines. JAMA. 2009;301(8):831-841.13. Guyatt GH, Oxman AD, Vist GE, et al; GRADEWorking Group. GRADE: an emerging consensus onrat ing qual i ty of evidence and strength ofrecommendations. BMJ. 2008;336(7650):924-926.14. Moher D, Schulz KF, Altman D; CONSORT Group(Consolidated Standards of Reporting Trials). TheCONSORT statement: revised recommendations forimproving the quality of reports of parallel-group ran-domized trials. JAMA. 2001;285(15):1987-1991.15. Gillen JE, Tse T, Ide NC, McCray AT. Design, imple-mentation and management of a web-based data en-try system for ClinicalTrials.gov. Stud Health Tech-nol Inform. 2004;107(Pt 2):1466-1470.16. Zarin DA, Tse T, Ide NC. Trial registration atClinicalTrials.gov between May and October 2005.N Engl J Med. 2005;353(26):2779-2787.17. Zarin DA, Ide NC, Tse T, Harlan WR, West JC,Lindberg DA. Issues in the registration of clinical trials.JAMA. 2007;297(19):2112-2120.18. Studies by topic: select a location. ClinicalTrials.gov. http://www.clinicaltrials.gov/ct2/search/browse?brwse=locn_cat. Accessed January 3, 2012.19. AACT database (Aggregate Analysis ofClinicalTrials.gov). Clinical Trials Transformation Ini-tiative. http://www.ctti-clinicaltrials.org/project-topics/clinical-trials.gov/aact-database. Accessed March 20,2012.20. McKenna MT, Michaud CM, Murray CJ, MarksJS. Assessing the burden of disease in the United Statesusing disability-adjusted life years. Am J Prev Med.2005;28(5):415-423.21. Medical Subjects Headings (MeSH) thesaurus.http://www.nlm.nih.gov/mesh/2010/introduction/introduction.html. Accessed February 20, 2012.

22. Tasneem A, Aberle L, Ananth H, et al. The data-base for Aggregate Analysis of ClinicalTrials.gov (AACT)and subsequent regrouping by clinical specialty. PLoSOne. 2012;7(3):e33677.23. Meinert CL, Tonascia S, Higgins K. Content of re-ports on clinical trials: a critical review. Control ClinTrials. 1984;5(4):328-347.24. Chan AW, Altman DG. Epidemiology and report-ing of randomised trials published in PubMed journals.Lancet. 2005;365(9465):1159-1162.25. Kris MG, Meropol NJ, Winer EP, eds. Accelerat-ing progress against cancer: ASCO’s blueprint for trans-forming clinical and translational cancer research.American Society of Clinical Oncology. http://www.asco.org/ascov2/department%20content/cancer%20policy%20and%20clinical%20affairs/downloads/blueprint.pdf. Accessed February 5, 2012.26. Yusuf S, Collins R, Peto R. Why do we need somelarge, simple randomized trials? Stat Med. 1984;3(4):409-422.27. Peto R, Collins R, Gray R. Large-scale random-ized evidence: large, simple trials and overviews of trials.J Clin Epidemiol. 1995;48(1):23-40.28. Califf RM, DeMets DL. Principles from clinical trialsrelevant to clinical practice: part I. Circulation. 2002;106(8):1015-1021.29. Glickman SW, McHutchison JG, Peterson ED, et al.Ethical and scientific implications of the globalizationof clinical research. N Engl J Med. 2009;360(8):816-823.30. Pasquali SK, Burstein DS, Benjamin DK Jr, SmithPB, Li JS. Globalization of pediatric research: analysisof clinical trials completed for pediatric exclusivity.Pediatrics. 2010;126(3):e687-e692.31. Kim ES, Carrigan TP, Menon V. International par-ticipation in cardiovascular randomized controlled trialssponsored by the National Heart, Lung, and BloodInstitute. J Am Coll Cardiol. 2011;58(7):671-676.32. Califf RM, Harrington RA. American industry andthe US Cardiovascular Clinical Research Enterprise anappropriate analogy? J Am Coll Cardiol. 2011;58(7):677-680.33. International Clinical Trials Registry Platform SearchPortal. World Health Organization. http://apps.who.int/trialsearch/. Accessed December 8, 2011.34. DeAngelis CD, Drazen JM, Frizelle FA, et al; In-ternational Committee of Medical Journal Editors. Isthis clinical trial fully registered? a statement from theInternational Committee of Medical Journal Editors.JAMA. 2005;293(23):2927-2929.35. Devereaux PJ, Manns BJ, Ghali WA, et al. Phy-sician interpretations and textbook definitions of blind-ing terminology in randomized controlled trials. JAMA.2001;285(15):2000-2003.36. English RA, Leibowitz Y, Giffin RB. Chapter 2: Thestate of clinical research in the United States: anoverview. In: Transforming Clinical Research in theUnited States: Challenges and Opportunities: Work-shop Summary. National Academies Press. http://books.nap.edu/openbook.php?record_id=12900.Accessed February 15, 2012.

CLINICAL TRIALS REGISTERED IN CLINICALTRIALS.GOV

©2012 American Medical Association. All rights reserved. JAMA, May 2, 2012—Vol 307, No. 17 1847

Downloaded From: http://jama.jamanetwork.com/ by a National Academy of Sciences User on 07/17/201224

Page 75: Sharing Clinical Research Data:  an IOM Workshop

Policy Forum

The Imperative to Share Clinical Study Reports:Recommendations from the Tamiflu ExperiencePeter Doshi1*, Tom Jefferson2, Chris Del Mar3

1 Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America, 2 The Cochrane Collaboration, Roma, Italy, 3 Centre for Research in

Evidence-Based Practice, Bond University, Gold Coast, Australia

Regulatory approval of new drugs is

assumed to reflect a judgment that a

medication’s benefits outweigh its harms.

Despite this, controversy over approved

drugs is common. If sales can be consid-

ered a proxy for utility, the controversies

surrounding even the most successful

drugs (such as blockbuster drugs) seem all

the more paradoxical, and have revealed

the extent to which the success of many

drugs has been driven by sophisticated

marketing rather than verifiable evidence

[1,2]. But even among institutions that

aim to provide the least biased, objective

assessments of a drug’s effects, determining

‘‘the truth’’ can be extremely difficult.

Consider the case of the influenza

antiviral Tamiflu (oseltamivir). Prior to

the global outbreak of H1N1 influenza in

2009, the United States alone had stock-

piled nearly US$1.5 billion dollars worth

of the antiviral [3]. As the only drug in its

class (neuraminidase inhibitors) available

in oral form, Tamiflu was heralded as the

key pharmacologic intervention for use

during the early days of an influenza

pandemic when a vaccine was yet to be

produced. It would cut hospitalizations

and save lives, said the US Department of

Health and Human Services (HHS) [4].

The Advisory Committee on Immuniza-

tion Practices (ACIP, the group the US

Centers for Disease Control and Preven-

tion [CDC] uses to form national influen-

za control policy) said it would reduce the

chances of developing complications from

influenza [5]. So, too, did the Australian

Therapeutic Goods Administration [6]

and the European Medicines Agency

(EMA) [7].

Most (perhaps all) of these claims can be

traced back to a single source: a meta-

analysis published in 2003 that combined

ten randomized clinical trials conducted

during the late 1990s by the manufacturer

prior to US registration of the drug [8].

This analysis, conducted by Kaiser and

colleagues, proposed that oseltamivir treat-

ment of influenza reduced both secondary

complications and hospital admission. In

contrast, the Food and Drug Administra-

tion (FDA), which approved Tamiflu in

1999 and was aware of these same clinical

trials, concluded that Tamiflu had not been

shown to reduce complications, and re-

quired an explicit statement in the drug’s

label to that effect [9]. FDA even cited

Roche, Tamiflu’s manufacturer, for viola-

tion of the law for claims made to the

contrary [10].

Nor did the FDA approve an indication

for Tamiflu in the prevention of transmis-

sion of influenza [9,11]. This assumption

was at the heart of the World Health

Organization’s (WHO) proposed plan to

suppress an emergent pandemic through

mass prophylaxis [12]. While the WHO

recently added Tamiflu to its Essential

Medicines list, if FDA is right, the drug’s

effectiveness may be no better than aspirin

or acetaminophen (paracetemol). The

FDA has never clarified the many discrep-

ancies in claims made over the effects of

Tamiflu. Although it may have limited

approval indications accordingly, the FDA

has never challenged the US HHS or the

US CDC for making far more ambitious

claims. This means that critical analysis by

an independent group such as a Cochrane

review group is essential.

But which data should be used? In

updating our Cochrane review of neur-

aminidase inhibitors, we have become

convinced that the answer lies in analyzing

clinical study reports rather than the

traditional published trials appearing in

biomedical journals [13]. Clinical study

reports contain the same information as

journal papers (usually in standardized

sections, including an introduction, meth-

ods, results, and conclusion [14]), but have

far more detail: the study protocol,

analysis plan, numerous tables, listings,

and figures, among others. They are far

larger (hundreds or thousands of pages),

and represent the most complete synthesis

of the planning, execution, and results of a

clinical trial. Journal publications of clin-

The Policy Forum allows health policy makersaround the world to discuss challenges andopportunities for improving health care in theirsocieties.

Citation: Doshi P, Jefferson T, Del Mar C (2012) The Imperative to Share Clinical Study Reports:Recommendations from the Tamiflu Experience. PLoS Med 9(4): e1001201. doi:10.1371/journal.pmed.1001201

Published April 10, 2012

Copyright: � 2012 Doshi et al. This is an open-access article distributed under the terms of the CreativeCommons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium,provided the original author and source are credited.

Funding: Peter Doshi is funded by an institutional training grant from the Agency for Healthcare Research andQuality #T32HS019488. The funders had no role in study design, data collection and analysis, decision topublish, or preparation of the manuscript.

Competing Interests: All review authors have applied for and received competitive research grants. Allreview authors are co-recipients of a UK National Institute for Health Research grant to carry out a Cochranereview of neuraminidase inhibitors (http://www.hta.ac.uk/2352). In addition: TJ was an ad hoc consultant for F.Hoffman-La Roche Ltd in 1998–1999. TJ receives royalties from his books published by Blackwells and IlPensiero Scientifico Editore, none of which are on neuraminidase inhibitors. TJ is occasionally interviewed bymarket research companies for anonymous interviews about Phase 1 or 2 products unrelated to neuraminidaseinhibitors. Since submission of this article, TJ has been retained as an expert consultant in a legal case involvingTamiflu. CDM provided expert advice to GlaxoSmithKline about vaccination against acute otitis media in 2008–2009. CDM receives royalties from books published through Blackwells BMJ Books and Elsevier. PD declares nofurther conflicts of interest.

Abbreviations: ACIP, Advisory Committee on Immunization Practices; CDC, US Centers for Disease Controland Prevention; EMA, European Medicines Agency; FDA, US Food and Drug Administration; HHS, USDepartment of Health and Human Services; WHO, World Health Organization

* E-mail: [email protected]

Provenance: Not commissioned; externally peer reviewed.

PLoS Medicine | www.plosmedicine.org 1 April 2012 | Volume 9 | Issue 4 | e100120125

Page 76: Sharing Clinical Research Data:  an IOM Workshop

ical trials may generate media attention

[2], propel researchers’ careers, and gen-

erate some journals a revenue stream [15].

However, when regulators decide whether

to register a new drug in a manufacturer’s

application, they review the trial’s clinical

study report.

In 2010, we began our Cochrane review

update using clinical study reports rather

than published papers [16]. We obtained

some sections of these clinical study reports

for the ten trials appearing in the Kaiser

2003 meta-analysis from Tamiflu’s manu-

facturer, Roche—around 3,200 pages in

total. In 2011, we obtained additional

sections of clinical study reports for Tamiflu

through a Freedom of Information request

to the EMA, amounting to tens of thou-

sands of pages. While extensive and

detailed, it is important to note that what

we have obtained is just a subset of the full

clinical study reports in Roche’s possession.

Nonetheless, Box 1 provides a list of details

we have already discovered—and would

have never discovered without access to

these documents. This information has

turned our understanding of the drug’s

effects on its head. Other drugs for which

previously unpublished, detailed clinical

trial data have radically changed public

knowledge of safety and efficacy include

Avandia, Neurontin, and Vioxx (Table 1).

An Urgent Call for a Debate onthe Ethics of Data Secrecy

Taken together, these experiences sug-

gest that any attempt at reliable evidence

synthesis must begin with clinical study

reports. Yet we are aware of only three

other groups of independent researchers

conducting a systematic review based

entirely (or mostly) on these and other

regulatory documents [17–19]. We can

think of two major reasons this might be.

First, outside of regulatory agencies, few

researchers have ever heard of clinical

study reports. Second, clinical study re-

ports are massive in size and difficult to

obtain, traditionally shared only with

regulators. The first problem seems trac-

table, but gaining better access to manu-

facturers’ clinical study reports requires

shifting the status quo from a default

position of confidentiality to one of

disclosure.

In the mid-20th century, regulatory

agencies became increasingly responsible

to the public for ensuring the safety and

effectiveness of approved medicines. This

rise paralleled an increase in the number

and complexity of clinical trials. Both in

the United States and Europe, manufac-

turers would come to submit trial data to

regulators with the assurance that author-

ities would treat all trade secret data as

confidential. Even following passage of the

1966 Freedom of Information Act (FOIA),

the FDA maintained that safety and

effectiveness test data were not subject to

the FOIA [20,21]. One sign things may be

changing came in November 2010, when

the EMA announced its intention to make

all industry clinical study reports about a

drug publicly available as a matter of

standard practice after reaching a regula-

tory decision [22,23]. The EMA has also

already improved its handling of data

under Freedom of Information requests.

But as we learned was true for Tamiflu,

regulators may not be in possession of full

trial reports of all studies on a given

intervention, implying the necessity of

obtaining reports from manufacturers

[11]. But at present, industry seems

extremely reluctant to make its clinical

study reports freely available.

In addition to clinical study reports, we

also need access to regulatory informa-

tion. By the very nature of their profes-

sional mandate, regulators may conduct

some of the most thorough evaluations of

a trial program, and efforts like ours

aimed at up-to-date evidence synthesis

are seriously deprived without access to

their reports. While the FDA has increas-

ingly published its reviews, memos, and

Summary Points

N Systematic reviews of published randomized clinical trials (RCTs) are consideredthe gold standard source of synthesized evidence for interventions, but theirconclusions are vulnerable to distortion when trial sponsors have stronginterests that might benefit from suppressing or promoting selected data.

N More reliable evidence synthesis would result from systematic reviewing ofclinical study reports—standardized documents representing the mostcomplete record of the planning, execution, and results of clinical trials, whichare submitted by industry to government drug regulators.

N Unfortunately, industry and regulators have historically treated clinical studyreports as confidential documents, impeding additional scrutiny by indepen-dent researchers.

N We propose clinical study reports become available to such scrutiny, anddescribe one manufacturer’s unconvincing reasons for refusing to provide usaccess to full clinical study reports. We challenge industry to either provideopen access to clinical study reports or publically defend their current positionof RCT data secrecy.

Box 1. What Is Missed without Access to Tamiflu Clinical StudyReports

1. Knowledge of the total denominator. (How many trials were conducted on thisdrug that might fit the systematic review inclusion criteria?) [13]

2. Realization that serious adverse events (SAEs) occurred in trials for which SAEswere not reported in published papers [13].

3. Understanding what happened in some trials that were published 10 years aftercompletion [43].

4. Vital details of trials (content and toxicity profile of placebos, mode of action ofdrug, description and temporality of adverse events) [11].

5. Authorship is not consistent with published papers [44] (although, if a review’sinclusion criteria include clinical study reports, authorship is not an issue, as theresponsibility is clearly the manufacturer’s).

6. Rationale for alternatively classifying outcomes such as pneumonia as acomplication or an adverse event [16].

7. Ability to know whether key subgroup analysis (influenza-infected subjects) isvalid [11].

8. Assessment of validity of previously released information on the drug (articles,reviews, conferences, media, etc.).

9. Realization that Roche’s claim of Tamiflu’s mode of action [45] appearsinconsistent with the evidence from trials [11,46].

PLoS Medicine | www.plosmedicine.org 2 April 2012 | Volume 9 | Issue 4 | e100120126

Page 77: Sharing Clinical Research Data:  an IOM Workshop

other correspondence on its Drugs@FDA

website (http://www.accessdata.fda.gov/

scripts/cder/drugsatfda/), and additional

documents are accessible through FOIA

requests, the process can be very time-

consuming, taking months or even years.

Moreover, regulatory agencies’ lack of

public inventories of their documentary

holdings complicates the retrieval of

information. Ideally, we would also have

details of the regulators’ deliberations,

which can serve as signposts to important

issues that need investigating.

In December 2009, after we voiced

serious concerns in the BMJ about Tami-

flu’s alleged ability to reduce complica-

tions [24], Roche wrote that it was ‘‘very

happy to have its data reviewed by

appropriate authorities or individuals,’’

and publicly pledged to release ‘‘full study

reports’’ for ten trials ‘‘within the coming

days’’ [25]. But despite extensive corre-

spondence over the next year and a half,

Roche refused to provide any more than

portions of the clinical study reports for

the ten Kaiser studies (Table 2) and no

reports for any of the additional Tamiflu

Table 1. Different sources of data for the uncovering of failures in reporting of safety and effectiveness of some examples of newdrugs.

Drug (Trade Name; Manufacturer) Brief HistoryType and Magnitude of theProblem How the Problem Was Discovered

Rosiglitazone (Avandia; GSK) FDA approval granted in 1999 foruse in diabetes. EMA did not approve,initially, over concerns about lack ofevidence of efficacy, but grantedapproval in 2000. In 2001, FDA issuednew warnings over concerns aboutcardiovascular risks (especially acutemyocardial infarction). In a separatelawsuit in 2004, GSK agreed to publiclypost online details of all its clinicaltrials. In 2010, the EMA suspendedrosiglitazone. The company is dealingwith 10,000 to 20,000 cases oflitigation in the United States alone.

The primary trials had weaknessesof the methods, including excessiveloss to follow-up, and over-emphasison proxy outcome measures (bloodglucose levels rather than mortalityand morbidity outcomes) [27–29].

2005–2007 series of meta-analysesincreasingly suggest harms, especiallycardiovascular.Public disclosure of unpublished studyresults was critical to uncovering theevidence of harm [30].

Gabapentin (Neurontin; Parke-Davis,a Pfizer subsidiary)

FDA approval was granted in 1994for a narrow indication: adjunctivetreatment for epilepsy (partialseizures). The company was accusedof using aggressive marketing usingsystematic methods, including keyopinion leaders (KOLs), to promoteoff-label use [31–33]. Most sales werefor off-label use. In 2002, FDAapproval was extended to post-herpetic neuralgia. However, in May2004, Pfizer agreed to pay a US$240million criminal fine in the UnitedStates for illegally promoting off-label use [34].

Delay in publishing companysponsored trials, such as a 1998Pfizer-funded study showing lackof efficacy for bipolar disease, inwhich publication was delayed fortwo years [34].

An internal whistleblower drew attentionto the problem, which resulted in legalaction taken by US authorities [34,35].

Rofecoxib (Vioxx; Merck) FDA approval granted in 1999. In2000, the VIGOR study was published,apparently demonstrating superiorityover non selective inhibition [36]. Dataclaimed to have been excluded in acritical editorial [37]. In 2004, Merckwithdrew the drug. Subsequently,Merck has been defending againstthousands of liability legal actions[38] after educational events hadbeen attempted to declare rofecoxibwas safe [39].

Unexpected and unanticipatedcardiovascular events (principally,myocardial infarction).A trial of rofecoxib conducted tocheck protection against colonpolyps established an increasedcardiovascular disease (CVD) risk [39].

Concern from a regulator (FDA) that theremight be increased CVD events caused bythe drug. Trial proposed, but notconducted [39].Clinical study reports indirectly importantin uncovering the scandal.

Oseltamivir (Tamiflu; Roche) FDA approval granted in 1999. Salessurged following concern overavian and pandemic influenza in2005 and 2009, leading to massivegovernment stockpiles worldwide.Numerous governmental bodiesassumed the drug would reducethe complications of influenza, andhospitalizations, based on a Roche-supported meta-analysis of someefficacy trials [8]. This claim waschallenged by our 2009 Cochranereview update in response tocriticism [40].

There was a public call for individualpatient data, already providedconfidentially to regulators, to bemade available for public scrutiny [41].Roche agreed to release ‘‘full studyreports’’, but (as discussed in thisarticle) later offered multiple reasonsfor not providing these data.

The latest Cochrane update relied onclinical study reports obtained fromregulators (freedom of informationrequests), reviews, and other documentsreleased by regulators and leakedsources. The manufacturer providedincomplete reports for some trials. Ourreview has led to the detection ofnumerous reporting biases andfundamental problems in trial design, andwe have concluded that previouseffectiveness claims were not supportedby the available evidence.

doi:10.1371/journal.pmed.1001201.t001

PLoS Medicine | www.plosmedicine.org 3 April 2012 | Volume 9 | Issue 4 | e100120127

Page 78: Sharing Clinical Research Data:  an IOM Workshop

Table 2. Roche’s reasons for not sharing its clinical study reports and the authors’ response (for 10 trials in Kaiser meta-analysis[8]).

Roche’s Basis for Reluctance or Refusal to Share Data Response

‘‘Unfortunately we are unable to send you the data requested as a similarmeta analysis is currently commencing with which there are concerns yourrequest may conflict. We have been approached by an independent expertinfluenza group and as part of their meta analysis we have provided accessto Roche’s study reports.’’ (Oct. 8, 2009)

It is unclear why another group of independent researchers would prevent Rochefrom sharing the same data with our group.

Cochrane reviewers were ‘‘unwilling to enter into the [confidentiality]agreement with Roche.’’ (Dec. 8, 2009)

The terms of Roche’s proposed contract were unacceptable to us. We declined tosign for two reasons: 1) all data disclosed under the contract were to be regarded asconfidential; and 2) signing the contract would also require us ‘‘not to disclose … theexistence and terms of this Agreement’’. We judged that the requirement to keep alldata, and the confidentiality agreement itself, secret would interfere with our explicitaim of openly and transparently systematically reviewing the trial data andaccounting for their provenance.

Roche says that it was ‘‘under the impression that you [Cochrane] were alsosatisfied with its provision based on our correspondence earlier this year(March 2010).’’ (June 1, 2010)

We did not immediately realize that what Roche had provided was incomplete.Irrespective of whether we had at one point seemed ‘‘satisfied,’’ Roche had notdelivered what it publicly promised in the BMJ on Dec. 8, 2009: ‘‘full study reports willalso be made available on a password-protected site within the coming days tophysicians and scientists undertaking legitimate analyses.’’

‘‘… around 3,200 pages of information have already been provided byRoche for review by your group and the scientific community.’’ (Aug. 20, 2010)

What is important is completeness, and 3,200 pages is a fraction of the full studyreports for the ten Kaiser trials Roche promised to make available.

‘‘Roche undertook this action [release of 3,200 pages] to demonstrate ourcomplete confidence in the data and our commitment to transparency to thedegree to which patient confidentiality, data exclusivity and the protection ofintellectual property allow.’’ (Aug. 20, 2010)

This implies that release of the promised-but-never-released data would impinge on‘‘patient confidentiality, data exclusivity and the protection of intellectual property’’.This does not seem to apply to many elements of clinical study reports (e.g., the trialprotocol and reporting analysis plan), and it is unclear why personal data could notbe anonymized.

‘‘The amount of data already made accessible to the scientific communitythrough our actions extends beyond what is generally provided to any thirdparty in the absence of a confidentiality agreement.’’ (Aug. 20, 2010)

It is irrelevant what is ‘‘generally provided’’. What is relevant is what was promisedand the need for public disclosure of clinical study reports.

‘‘Over the last few months, we have witnessed a number of developmentswhich raise concerns that certain members of Cochrane Group involvedwith the review of the neuraminidase inhibitors are unlikely to approach thereview with the independence that is both necessary and justified. Amongstothers, this includes incorrect statements concerning Roche/Tamiflu madeduring a recent official enquiry into the response to last year’s pandemicby a member of this Cochrane Review Group. Roche intends to follow upseparately to clarify this issue. We also note with concern that certaininvestigators, who the Cochrane Group is proposing will carry out theplanned review, have previously published articles covering Tamifluwhich we believe lack the appropriate scientific rigor and objectivity.’’(Aug. 20, 2010)

Despite Roche’s promise and our request for specifics, Roche never respondeddirectly.

‘‘We noted in our correspondence to the BMJ in December of last year ourconcern that the first requests for data to assist in your review did not comefrom the Cochrane Group, but from the media apparently trying to obtaindata following discussions with the Cochrane Review Group. This raisedserious questions regarding the motivation for the review from the outset.We note that in subsequent correspondence regarding your next plannedreview you have copied a number of journalists when responding to emailssent by Roche staff.’’ (Aug. 20, 2010)

Our view is that Tamiflu is a global public health drug and the media have alegitimate reason for helping independent reviewers obtain data, which includesbeing informed of our efforts to do so.

Cochrane reviewers have been provided with ‘‘all the trial data [they]require …’’ (Jan. 14, 2011)

We disagree. First, it is up to us to decide what we require. Second, we now knowthat what was provided was not enough. For example, Roche did not provide us withthe trial protocols and full amendment history.

‘‘You have all the detail you need to undertake a review and so we havedecided not to supply any more detailed information. We do not believethe requested detail to be necessary for the purposes of a review ofneuraminidase inhibitors.’’ (April 26, 2011)

We have still not received what was promised in December 2009, and we know thatwhat we have received is deficient.

‘‘It is the role of Global Regulatory Authorities to review detailedinformation of medicines when assessing benefit/risk. This has occurred,and continues to occur with Tamiflu, as with all other medicines, throughregular license updates.’’ (April 26, 2011)

Independent researchers such as the Cochrane Collaboration share the goal ofassessing benefit/risk, and require all details necessary to competently perform thisfunction.

‘‘Roche has made full clinical study data available to health authoritiesaround the world for their review as part of the licensing process.’’ [42](Jan. 20, 2012)

Roche may have made full clinical study data ‘‘available’’ but that does not mean they‘‘provided’’ all regulators with full clinical study data. For at least 15 Tamiflu trials,Roche did not provide the European regulator (EMA) with full study reports,apparently because EMA did not expressly request the complete clinical studyreports. (Correspondence with EMA, May 24 and Jul. 20, 2011)

doi:10.1371/journal.pmed.1001201.t002

PLoS Medicine | www.plosmedicine.org 4 April 2012 | Volume 9 | Issue 4 | e100120128

Page 79: Sharing Clinical Research Data:  an IOM Workshop

trials we had subsequently identified and

requested (Table 3). Reasons for refusing

to share the full reports on Tamiflu kept

changing, and none seemed credible

(Tables 2 and 3).

There are strong ethical arguments for

ensuring that all clinical study reports are

publicly accessible. It is the public who take

and pay for approved drugs, and therefore

the public should have access to complete

information about those drugs. We should

also not lose sight of the fact that clinical

trials are experiments conducted on hu-

mans that carry an assumption of contrib-

uting to medical knowledge. Non-disclosure

of complete trial results undermines the

philanthropy of human participants and sets

back the pursuit of knowledge.

Potentially valid reasons for restricting

the full public release of clinical study

reports include: ensuring patient confiden-

tiality (although this could be remedied

with redaction); commercial secrets (al-

though these should be clear after drugs

have been registered, and we found no

commercially sensitive information in the

clinical study reports received from EMA

with minimal redactions); and finally,

industry concerns over adversaries’ mali-

cious ‘‘cherry picking’’ over large datasets

(which could be reduced by requiring the

prospective registration of research proto-

cols). None appear insurmountable.

With the EMA’s stated intentions on far

wider data disclosure, we hope the debate

may soon shift from one of whether to

release regulatory data to the specifics of

doing so. But until these policies go into

effect—and perhaps even after they do—

most drugs on the market will remain

those approved in an era in which

regulators protected industry’s data [26].

It is therefore vital to know where industry

stands. If drug companies have legitimate

reasons for maintaining the status quo of

treating all of their data as trade secret, we

have yet to hear them. We are all ears.

Supporting Information

Alternative Language Summary

Points S1 Translation of the Sum-mary Points into Russian by VasiliyVlassov.

(DOC)

Alternative Language Summary

Points S2 Translation of the Sum-mary Points into Italian by TomJefferson.

(DOCX)

Alternative Language Summary

Points S3 Translation of the Sum-mary Points into Japanese by YukoHara and Yasuyuki Kato.

(DOC)

Alternative Language Summary

Points S4 Translation of the Sum-mary Points into Croatian by Mir-jana Huic.

(DOC)

Alternative Language Summary

Points S5 Translation of the Sum-

mary Points into Danish by AndreasLundh.

(DOC)

Alternative Language Summary

Points S6 Translation of the Sum-mary Points into German by GerdAntes.

(DOC)

Alternative Language Summary

Points S7 Translation of the Sum-mary Points into Spanish by JuanAndres Leon.

(DOC)

Alternative Language Summary

Points S8 Translation of the Sum-mary Points into Chinese by Han-PuTung.

(DOC)

Acknowledgments

We thank our co-authors of the Cochrane

review of neuraminidase inhibitors. We also

thank Yuko Hara for helpful comments on the

draft manuscript and Ted Postol for many

valuable insights. We are also grateful to those

who helped with translations.

Author Contributions

Analyzed the data: PD TJ CDM. Wrote the first

draft of the manuscript: PD TJ CDM. Contrib-

uted to the writing of the manuscript: PD TJ

CDM. ICMJE criteria for authorship read and

met: PD TJ CDM. Agree with manuscript

results and conclusions: PD TJ CDM.

Table 3. Roche’s reasons for not sharing its clinical study reports and the authors’ response (for other Tamiflu trials).

Roche’s Basis for Not Sharing Data Response

‘‘With regards your recent request for additional data, we are takingthe request very seriously and as such must assess this within the widerorganization particularly with respect to our obligations around patientand physician confidentiality, legality and feasibility’’ (July 24, 2010)

We agree that ‘‘patient and physician confidentiality, legality and feasibility’’ are validconcerns. However, Roche has never provided details.

With respect to clinical study reports for additional studies beyond theoriginal ten meta-analyzed by Kaiser: ‘‘we request that you submit your fullanalysis plan for review by Roche and the scientific/medical community.’’(Aug. 20, 2010)

This is a reasonable request. However, we shared our systematic review protocol [16]with Roche on Dec. 11, 2010, following peer-review and publication online-first onthe cochrane.org website (the usual Cochrane systematic review practice).

‘‘Concerning access to the reports beyond the Kaiser analysis … many arepharmacokinetic studies conducted in healthy volunteers and so are unlikelyto be useful in a meta analysis. The studies in prevention and in children arepublished in peer reviewed publications, or in summary form on www.rochetrials.com.’’ (Jan. 14, 2011)

What Roche has directed us to is insufficient (peer-reviewed publications and theextremely compressed summaries of trials on http://www.rochetrials.com) as thetopic of our review is unpublished data, most prominently clinical study reports.

‘‘The long list of studies you sent are either published in peer reviewedpublications, in summary form on www.rochetrials.com, halted early,or ongoing.’’ (Feb. 14, 2011)

Roche continues to imply the evidentiary equivalence between unpublished clinicalstudy reports and extremely compressed summary formats such as journalpublications—a position we reject. We have repeatedly stated and shown evidence(in our study protocol and elsewhere [13,16]) that reliable evidence synthesis requiresaccess to clinical study reports.

doi:10.1371/journal.pmed.1001201.t003

PLoS Medicine | www.plosmedicine.org 5 April 2012 | Volume 9 | Issue 4 | e100120129

Page 80: Sharing Clinical Research Data:  an IOM Workshop

References

1. Brody H, Light DW (2011) The inverse benefitlaw: how drug marketing undermines patient

safety and public health. Am J Public Health101(3): 399–404.

2. Smith R (2005) Medical journals are an extensionof the marketing arm of pharmaceutical compa-

nies. Plos Med 2(5): e138. doi:10.1371/journal.

pmed.0020138.3. The New York Times (28 April 2009) Tamiflu

(drug). The New York Times. Available: http://topics.nytimes.com/topics/news/health/

diseasesconditionsandhealthtopics/tamiflu-drug/

index.html. Accessed 6 March 2012.4. U.S. Department of Health and Human Services

(2005) HHS pandemic influenza plan. Available:http://www.hhs.gov/pandemicflu/plan/pdf/

HHSPandemicInfluenzaPlan.pdf. Accessed 6

March 2012.5. Harper SA, Fukuda K, Uyeki TM, Cox NJ,

Bridges CB (2004) Prevention and control ofinfluenza: recommendations of the Advisory

Committee on Immunization Practices (ACIP).MMWR Recomm Rep 53(RR-6): 1–40.

6. Roche Products Pty Limited (2011) TAMIFLUHcapsules: consumer medicine information. Avail-able: http://www.roche-australia.com/fmfiles/

re7229005/downloads/anti-virals/tamiflu-cmi.pdf. Accessed 6 March 2012.

7. European Medicines Agency (2011) Summary of

product characteristics (Tamiflu 30 mg hardcapsule). Available: http://www.ema.europa.eu/

docs/en_GB/document_library/EPAR_-_Product_Information/human/000402/WC500033106.pdf.

Accessed 5 February 2012.8. Kaiser L, Wat C, Mills T, Mahoney P, Ward P,

Hayden F (2003) Impact of oseltamivir treatment

on influenza-related lower respiratory tract com-plications and hospitalizations. Arch Intern Med

163(14): 1667–1672.9. Hoffman-La Roche, Ltd. (2011) Product label.

Tamiflu (oseltamivir phosphate) capsules and for

oral suspension. Available: http://www.accessdata.fda.gov/drugsatfda_docs/label/2011/

021087s056_021246s039lbl.pdf. Accessed 6March 2012.

10. U.S. Food and Drug Administration (14 April2000) NDA 21-087 TAMIFLU (oseltamivir

phosphate) MACMIS ID#8675. Available:

http://www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Enforcement

ActivitiesbyFDA/WarningLettersandNoticeofViolationLetterstoPharmaceuticalCompanies/UCM

166329.pdf. Accessed 6 March 2012.

11. Jefferson T, Jones MA, Doshi P, Del Mar CB,Heneghan CJ, et al. (2012) Neuraminidase

inhibitors for preventing and treating influenzain healthy adults and children. Cochrane Data-

base Syst Rev 2012(1): CD008965.12. World Health Organization (2007) WHO interim

protocol: rapid operations to contain the initial

emergence of pandemic influenza. Available:http://www.who.int/influenza/resources/

documents/RapidContProtOct15.pdf. Accessed6 March 2012.

13. Jefferson T, Doshi P, Thompson M, Heneghan C

(2011) Ensuring safe and effective drugs: who cando what it takes? BMJ 342: c7258.

14. International Conference on Harmonisation ofTechnical Requirements for Registration of

Pharmaceuticals for Human Use (1996) Guide-line for industry: structure and content of Clinical

Study Reports (ICH E3). Available: http://www.fda.gov/downloads/regulatoryinformation/

guidances/ucm129456.pdf. Accessed 6 2012.15. Lundh A, Barbateskovic M, Hrobjartsson A,

Gøtzsche PC (2010) Conflicts of interest at

medical journals: the influence of industry-supported randomised trials on journal impact

factors and revenue – cohort study. PLoS Med7(10): e1000354. doi:10.1371/journal.pmed.

1000354.

16. Jefferson T, Jones MA, Doshi P, Del Mar CB,Heneghan CJ, et al. (2011) Neuraminidase

inhibitors for preventing and treating influenzain healthy adults and children - a review of

unpublished data.;Cochrane Database Syst Rev

2011: CD008965. Available: http://doi.wiley.com/10.1002/14651858.

17. Gøtzsche PC, Jørgensen AW (2011) Opening updata at the European Medicines Agency. BMJ

342: d2686.18. Yale School of Medicine (2011) Yale University

Open Data Access Project (YODA Project).

Available: http://medicine.yale.edu/core/projects/yodap/index.aspx. Accessed 6 March

2012.19. Eyding D, Lelgemann M, Grouven U, Harter M,

Kromp M, et al. (2010) Reboxetine for acute

treatment of major depression: systematic reviewand meta-analysis of published and unpublished

placebo and selective serotonin reuptake inhibitorcontrolled trials. BMJ 341: c4737.

20. Halperin RM (1979) FDA disclosure of safety andeffectiveness data: a legal and policy analysis.

Duke Law Journal 1979(1): 286–326.

21. Kesselheim AS, Mello MM (2007) Confidentialitylaws and secrecy in medical research: improving

public access to data on drug safety. Health Aff(Millwood) 26(2): 483–491.

22. European Medicines Agency (2010) Output of the

European Medicines Agency policy on access todocuments related to medicinal products for

human and veterinary use. Available: http://www.ema.europa.eu/docs/en_GB/document_

library/Regulatory_and_procedural_guideline/2010/11/WC500099472.pdf. Accessed 6 March

2012.

23. Pott A (2011) EMA’s response to articles. BMJ342: d3838.

24. Jefferson T, Jones M, Doshi P, Del Mar C (2009)Neuraminidase inhibitors for preventing and

treating influenza in healthy adults: systematic

review and meta-analysis. BMJ 339: b5106.25. Smith J, on behalf of Roche (2009) Point-by-point

response from Roche to BMJ questions. BMJ 339:b5374.

26. Abraham J (1995) Science, politics and thepharmaceutical industry: controversy and bias in

drug regulation. First edition. Routledge.

27. Cohen D (2010) Rosiglitazone: what went wrong?BMJ 341: c4848.

28. Rosen CJ (2007) The rosiglitazone story—lessonsfrom an FDA advisory committee meeting.

N Engl J Med 357(9): 844–846.

29. Richter B, Bandeira-Echtler E, Bergerhoff K,Clar C, Ebrahim SH (2007) Rosiglitazone for

type 2 diabetes mellitus. Cochrane Database SystRev 2007(3): CD006063.

30. Nissen SE, Wolski K (2007) Effect of rosiglitazone

on the risk of myocardial infarction and death

from cardiovascular causes. N Engl J Med

356(24): 2457–2471.

31. Lenzer J (2003) Whistleblower charges drug

company with deceptive practices. BMJ

326(7390): 620.

32. Kesselheim AS, Mello MM, Studdert DM (2011)

Strategies and practices in off-label marketing of

pharmaceuticals: a retrospective analysis of whis-

tleblower complaints. PLoS Med 8: e1000431.

doi:10.1371/journal.pmed.1000431.

33. Steinman MA, Bero LA, Chren M-M,

Landefeld CS (2006) Narrative review: the

promotion of gabapentin: an analysis of internal

industry documents. Ann Intern Med 145(4):

284–293.

34. Lenzer J (2004) Pfizer pleads guilty, but drug sales

continue to soar. BMJ 328(7450): 1217.

35. Steinman MA, Harper GM, Chren M-M,

Landefeld CS, Bero LA (2007) Characteristics

and impact of drug detailing for gabapentin.

PLoS Med 4(4): e134. doi:10.1371/journal.

pmed.0040134.

36. Bombardier C, Laine L, Reicin A, Shapiro D,

Burgos-Vargas R, et al. (2000) Comparison of

upper gastrointestinal toxicity of rofecoxib and

naproxen in patients with rheumatoid arthritis.

N Engl J Med 343(21): 1520–1528.

37. Curfman GD, Morrissey S, Drazen JM (2005)

Expression of concern: Bombardier et al., ‘‘Com-

parison of upper gastrointestinal toxicity of

rofecoxib and naproxen in patients with rheuma-

toid arthritis,’’ N Engl J Med 2000; 343:1520–8.

N Engl J Med 353(26): 2813–2814.

38. Wilson D (22 November 2011) Merck to pay

$950 million over Vioxx]. The New York Times.

Available: https://www.nytimes.com/2011/11/

23/business/merck-agrees-to-pay-950-million-in-

vioxx-case.html. Accessed 6 March 2012.

39. Topol EJ (2004) Failing the public health–

rofecoxib, Merck, and the FDA. N Engl J Med

351(17): 1707–1709.

40. Doshi P (2009) Neuraminidase inhibitors–the

story behind the Cochrane review. BMJ 339:

b5164.

41. Godlee F (2009) We want raw data, now. BMJ

339: b5405.

42. Melville NA (20 January 2012) Tamiflu data

called inconsistent, underreported. Available:

http://www.medscape.com/viewarticle/757226

[requires free registration].

43. Dutkowski R, Smith JR, Davies BE (2010) Safety

and pharmacokinetics of oseltamivir at standard

and high dosages. Int J Antimicrob Agents 35(5):

461–467.

44. Cohen D (2009) Complications: tracking down

the data on oseltamivir. BMJ 339: b5387.

45. Welliver R, Monto AS, Carewicz O, Schatteman E,

Hassman M, et al. (2001) Effectiveness of oselta-

mivir in preventing influenza in household con-

tacts: a randomized controlled trial. JAMA 285(6):

748–754.

46. Gilead Sciences (1999) Roche announces new

data on recently approved TamifluTM, first pill to

treat the most common strains of influenza (A &

B). Available: http://www.gilead.com/pr_

942082531. Accessed 6 March 2012.

PLoS Medicine | www.plosmedicine.org 6 April 2012 | Volume 9 | Issue 4 | e100120130

Page 81: Sharing Clinical Research Data:  an IOM Workshop

Who Shares? Who Doesn’t? Factors Associated withOpenly Archiving Raw Research DataHeather A. Piwowar*¤

Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America

Abstract

Many initiatives encourage investigators to share their raw datasets in hopes of increasing research efficiency and quality.Despite these investments of time and money, we do not have a firm grasp of who openly shares raw research data, whodoesn’t, and which initiatives are correlated with high rates of data sharing. In this analysis I use bibliometric methods toidentify patterns in the frequency with which investigators openly archive their raw gene expression microarray datasetsafter study publication. Automated methods identified 11,603 articles published between 2000 and 2009 that describe thecreation of gene expression microarray data. Associated datasets in best-practice repositories were found for 25% of thesearticles, increasing from less than 5% in 2001 to 30%–35% in 2007–2009. Accounting for sensitivity of the automatedmethods, approximately 45% of recent gene expression studies made their data publicly available. First-order factoranalysis on 124 diverse bibliometric attributes of the data creation articles revealed 15 factors describing authorship,funding, institution, publication, and domain environments. In multivariate regression, authors were most likely to sharedata if they had prior experience sharing or reusing data, if their study was published in an open access journal or a journalwith a relatively strong data sharing policy, or if the study was funded by a large number of NIH grants. Authors of studieson cancer and human subjects were least likely to make their datasets available. These results suggest research data sharinglevels are still low and increasing only slowly, and data is least available in areas where it could make the biggest impact.Let’s learn from those with high rates of sharing to embrace the full potential of our research output.

Citation: Piwowar HA (2011) Who Shares? Who Doesn’t? Factors Associated with Openly Archiving Raw Research Data. PLoS ONE 6(7): e18657. doi:10.1371/journal.pone.0018657

Editor: Cameron Neylon, Science and Technology Facilities Council, United Kingdom

Received September 16, 2010; Accepted March 15, 2011; Published July 13, 2011

Copyright: � 2011 Healther A. Piwowar. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permitsunrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: This work was completed in the Department of Biomedical Informatics at the University of Pittsburgh and partially funded by an NIH National Library ofMedicine training grant 5 T15 LM007059. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of themanuscript.

Competing Interests: The author has declared that no competing interests exist.

* E-mail: [email protected]

¤ Current address: NESCent, The National Evolutionary Synthesis Center, Durham, North Carolina, United States of America

Introduction

Sharing and reusing primary research datasets has the potential

to increase research efficiency and quality. Raw data can be used

to explore related or new hypotheses, particularly when combined

with other available datasets. Real data are indispensable for

developing and validating study methods, analysis techniques, and

software implementations. The larger scientific community also

benefits: Sharing data encourages multiple perspectives, helps to

identify errors, discourages fraud, is useful for training new

researchers, and increases efficient use of funding and population

resources by avoiding duplicate data collection.

Eager to realize these benefits, funders, publishers, societies, and

individual research groups have developed tools, resources, and

policies to encourage investigators to make their data publicly

available. For example, some journals require the submission of

detailed biomedical datasets to publicly available databases as a

condition of publication [1,2]. Many funders require data sharing

plans as a condition of funding: Since 2003, the National Institutes

of Health (NIH) in the USA has required a data sharing plan for

all large funding grants [3] and has more recently introduced

stronger requirements for genome-wide association studies [4]. As

of January 2011, the US National Science Foundation requires

that data sharing plans accompany all research grant proposals

[5]. Several government whitepapers [6,7] and high-profile

editorials [8,9] call for responsible data sharing and reuse.

Large-scale collaborative science is increasing the need to share

datasets [10,11], and many guidelines, tools, standards, and

databases are being developed and maintained to facilitate data

sharing and reuse [12,13].

Despite these investments of time and money, we do not yet

understand the impact of these initiatives. There is a well-known

adage: You cannot manage what you do not measure. For those

with a goal of promoting responsible data sharing, it would be

helpful to evaluate the effectiveness of requirements, recommen-

dations, and tools. When data sharing is voluntary, insights could

be gained by learning which datasets are shared, on what topics,

by whom, and in what locations. When policies make data sharing

mandatory, monitoring is useful to understand compliance and

unexpected consequences.

Dimensions of data sharing action and intention have been

investigated by a variety of studies. Manual annotations and

systematic data requests have been used to estimate the frequency

of data sharing within biomedicine [14,15,16,17], though few

attempts were made to determine patterns of sharing and withholding

within these samples. Blumenthal [18], Campbell [19], Hedstrom

[20], and others have used survey results to correlate self-reported

instances of data sharing and withholding with self-reported attributes

PLoS ONE | www.plosone.org 1 July 2011 | Volume 6 | Issue 7 | e1865731

Page 82: Sharing Clinical Research Data:  an IOM Workshop

like industry involvement, perceived competitiveness, career produc-

tivity, and anticipated data sharing costs. Others have used surveys

and interviews to analyze opinions about the effectiveness of

mandates [21] and the value of various incentives [20,22,23,24]. A

few inventories list the data-sharing policies of funders [25,26] and

journals [1,27], and some work has been done to correlate policy

strength with outcome [2,28]. Surveys and case studies have been

used to develop models of information behavior in related domains,

including knowledge sharing within an organization [29,30],

physician knowledge sharing in hospitals [31], participation in open

source projects [32], academic contributions to institutional archives

[33,34], the choice to publish in open access journals [35], sharing

social science datasets [20], and participation in large-scale

biomedical research collaborations [36].

Although these studies provide valuable insights and their

methods facilitate investigation into an author’s intentions and

opinions, they have several limitations. First, associations to an

investigator’s intention to share data do not directly translate to

associations with actually sharing data [37]. Second, associations

that rely on self-reported data sharing and withholding likely suffer

from underreporting and confounding, since people admit

withholding data much less frequently than they report having

experienced the data withholding of others [18].

I suggest a supplemental approach for investigating research

data-sharing behavior. I have collected and analyzed a large set of

observed data sharing actions and associated study, investigator,

journal, funding, and institutional variables. The reported analysis

explores common factors behind these attributes and looks at the

association between these factors and data sharing prevalence.

I chose to study data sharing for one particular type of data:

biological gene expression microarray intensity values. Microarray

studies provide a useful environment for exploring data sharing

policies and behaviors. Despite being a rich resource valuable for

reuse [38], microarray data are often, but not yet, universally

shared. Best-practice guidelines for sharing microarray data are

fairly mature [12,39]. Two centralized databases have emerged as

best-practice repositories: the Gene Expression Omnibus (GEO)

[13] and ArrayExpress [40]. Finally, high-profile letters have

called for strong journal data-sharing policies [41], resulting in

unusually strong data sharing requirements in some journals [42].

As such, the results here represent data sharing in an environment

where it has been particularly encouraged and supported.

Methods

In brief, I used a full-text query to identify a set of studies in

which the investigators generated gene expression microarray

datasets. Best-practice data repositories were searched for

associated datasets. Attributes of the studies were used to derive

factors related to the investigators, journals, funding, institutions,

and topic of the studies. Associations between these study factors

and the frequency of public data archiving were determined

through multivariate regression.

Studies for analysisThe set of ‘‘gene expression microarray creation’’ articles was

identified by querying the title, abstract, and full-text of PubMed,

PubMed Central, Highwire Press, Scirus, and Google Scholar with

portal-specific variants of the following query:

(‘‘gene expression’’ [text] AND ‘‘microarray’’

[text] AND ‘‘cell’’ [text] AND ‘‘rna’’ [text])

AND (‘‘rneasy’’ [text] OR ‘‘trizol’’ [text] OR ‘‘real-

time pcr’’ [text])

NOT (‘‘tissue microarray*’’ [text] OR ‘‘cpg is-

land*’’ [text])

Retrieved articles were mapped to PubMed identifiers whenever

possible; the union of the PubMed identifiers returned by the full

text portals was used as the definitive list of articles for analysis. An

independent evaluation of this approach found that it identified

articles that created microarray data with a precision of 90% (95%

confidence interval, 86% to 93%) and a recall of 56% (52% to

61%), compared to manual identification of articles that created

microarray data [43].

Because Google Scholar only displays the first 1000 results of a

query, I was not able to view all of its hits. I tried to identify as

many Google Scholar search results as possible by iteratively

appending a variety of attributes to the end of the query, including

various publisher names, journal title words, and years of

publication, thereby retrieving distinct subsets of the results 1000

hits at a time.

Data availabilityThe dependent variable in this study was whether each gene

expression microarray research article had an associated dataset in

a best-practice public centralized data repository. A community

letter encouraging mandatory archiving in 2004 [41] identified

three best-practice repositories for storing gene expression

microarray data: NCBI’s Gene Expression Omnibus (GEO),

EBI’s ArrayExpress, and Japan’s CIBEX database. The first two

were included in this analysis, since CIBEX was defunct until

recently.

An earlier evaluation found that querying GEO and ArrayEx-

press with article PubMed identifiers located a representative 77%

of all associated publicly available datasets [44]. I used the same

method for finding datasets associated with published articles in

this study: I queried GEO for links to the PubMed identifiers in

the analysis sample using the ‘‘pubmed_gds [filter]’’ and queried

ArrayExpress by searching for each PubMed identifier in a

downloaded copy of the ArrayExpress database. Articles linked

from a dataset in either of these two centralized repositories were

considered to have ‘‘shared their data’’ for the endpoint of this

study, and those without such a link were considered not to have

shared their data.

Study attributesFor each study article I collected 124 attributes for use as

independent variables. The attributes were collected automatically

from a wide variety of sources. Basic bibliometric metadata was

extracted from the MEDLINE record, including journal, year of

publication, number of authors, Medical Subject Heading (MeSH)

terms, number of citations from PubMed Central, inclusion in

PubMed subsets for cancer, whether the journal is published with

an open-access model and if it had data-submission links from

Genbank, PDB, and SwissProt.

ISI Journal Impact Factors and associated metrics were extracted

from the 2008 ISI Journal Citation Reports. I quantified the content

of journal data-sharing policies based on the ‘‘Instruction for

Authors’’ for the most commonly occurring journals.

NIH grant details were extracted by cross-referencing grant

numbers in the MEDLINE record with the NIH award information

(http://report.nih.gov/award/state/state.cfm). From this informa-

tion I tabulated the amount of total funding received for each of the

fiscal years from 2003 to 2008. I also estimated the date of renewal

by identifying the most recent year in which a grant number was

prefixed by a ‘‘1’’ or ‘‘2’’ —indication that the grant is ‘‘new’’ or

‘‘renewed,’’ respectively.

Who Shares? Who Doesn’t?

PLoS ONE | www.plosone.org 2 July 2011 | Volume 6 | Issue 7 | e1865732

Page 83: Sharing Clinical Research Data:  an IOM Workshop

The corresponding address was parsed for institution and

country, following the methods of Yu et al. [45]. Institutions were

cross-referenced to the SCImago Institutions Rankings 2009

World Report (http://www.scimagoir.com/) to estimate the

relative degree of research output and impact of the institutions.

Attributes of study authors were collected for first and last authors

(in biomedicine, customarily, the first and last authors make the

largest contributions to a study and have the most power in

publication decisions). The gender of the first and last authors were

estimated using the Baby Name Guesser website (http://www.

gpeters.com/names/baby-names.php). A list of prior publications

in MEDLINE was extracted from Author-ity clusters, 2009 edition

[46], for the first and last author of each article in this study. To limit

the impact of extremely large ‘‘lumped’’ clusters that erroneously

contain the publications of more than one actual author, I excluded

prior publication lists for first or last authors in the largest 2% of

clusters and instead considered these data missing. For each paper

in an author’s publication history with PubMed identifiers

numerically less than the PubMed identifier of the paper in

question, I queried to find if any of these prior publications had been

published in an open source journal, were included in the ‘‘gene

expression microarray creation’’ subset themselves, or had reused

gene expression data. I recorded the date of the earliest publication

by the author and the number of citations to date that their earlier

papers received in PubMed Central.

I attempted to estimate if the paper itself reused publicly

available gene expression microarray data by looking for its

inclusion in the list that GEO keeps of reuse at http://www.ncbi.

nlm.nih.gov/projects/geo/info/ucitations.html.

Data collection scripts were coded in Python version 2.5.2 (many

libraries were used, including EUtils, BeautifulSoup, pyparsing and

nltk [47]) and SQLite version 3.4. Data collection source code is

available at github (http://github.com/hpiwowar/pypub).

Statistical methodsStatistical analysis was performed in R version 2.10.1 [48]. P-

values were two-tailed. The collected data were visually explored

using Mondrian version 1.1 [49] and the Hmisc package [50]. I

applied a square-root transformation to variables representing

count data to improve their normality prior to calculating

correlations.

To calculate variable correlations, I used the hector function in

the polycor library. This computes polyserial correlations between

pairs of numeric and ordinal variables and polychoric correlations

between two ordinal variables. I modified it to calculate Pearson

correlations between numeric variables using the rcorr function in

the Hmisc library. I used a pairwise-complete approach to missing

data and used the nearcor function in the sfsmisc library to make

the correlation matrix positive definite. A correlation heatmap was

produced using the gplots library.

I used the nFactors library to calculate and display the scree plot

for correlations.

Since the correlation matrix was not well-behaved enough for

maximum-likelihood factor analysis, first-order exploratory factor

analysis was performed with the fa function in the psych library, using

the minimum residual (minres) solution and a promax oblique

rotation. Second-order factor analysis also used the minres solution

but a varimax rotation, since I wanted these factors to be orthogonal.

I computed the loadings on the original variables for the second-order

factors using the method described by Gorsuch [51].

Before computing the factor scores for the original dataset,

missing values were imputed through Gibbs sampling with two

iterations through the mice library.

Using this complete dataset, I computed scores for each of the

datapoints onto all of the first and second-order factors using

Bartlett’s algorithm as extracted from the factanal function. I

submitted these factor scores to a logistic regression using the lrm

function in the rms package. Continuous variables were modeled

as cubic splines with 4 knots using the rcs function from the rms

package, and all two-way interactions were explored.

Finally, hierarchical supervised clustering on the datapoints was

performed to learn which factors were most predictive and then

estimated the data sharing prevalence in a contingency table of

these two clusters split at their medians.

Results

Full-text queries for articles describing the creation of gene

expression microarray datasets returned PubMed identifiers for

11,603 studies.

MEDLINE fields were still ‘‘in process’’ for 512 records,

resulting in missing data for MeSH-derived variables. Impact

factors were found for all but 1,001 articles. Journal policy

variables were missing for 4,107 articles. The institution ranking

attributes were missing for 6,185. I cross-referenced NIH grant

details for 3,064 studies (some grant numbers could not be parsed,

because they were incomplete or strangely formatted). I was able

to determine the gender of the first and last authors, based on the

forenames in the MEDLINE record, for all but 2,841 first authors

and 2,790 last authors. All but 1,765 first authors and 797 last

authors were found to have a publication history in the 2009

Author-ity clusters.

PubMed identifiers were found in the ‘‘primary citation’’ field of

dataset records in GEO or ArrayExpress for 2,901 of the 11,603

articles in this dataset, indicating that 25% (95% confidence

intervals: 24% to 26%) of the studies deposited their data in GEO

or ArrayExpress and completed the citation fields with the primary

article PubMed identifier. This is my estimate for the prevalence of

gene expression microarray data deposited into the two predom-

inant, centralized, publicly accessible databases.

This data-sharing rate increased with each subsequent article

publication year, as seen in Figure 1, increasing from less than 5%

in 2001 to 30%–35% in 2007–2009. Accounting for the sensitivity

of my automated method for detecting open data anywhere on the

internet (about 77% [44]), it could be estimated that approxi-

mately 45% (0.35/0.77) of recent gene expression studies have

made their data publicly available.

The data-sharing rate also varied across journals. Figure 2

shows the data-sharing rate across the 50 journals with the most

articles in this study. Many of the other attributes were also

associated with the prevalence of data sharing in univariate

analysis, as seen in Figures S1, S2, S3, S4, S5.

First-order factorsI tried to use a scree plot to determine the optimal number of

factors for first-order analysis. Since the scree plot did not have a

clear drop-off, I experimented with a range of factor counts near

the optimal coordinates index (as calculated by nScree in the

nFactors R-project library) and finalized on 15 factors. The

correlation matrix was not sufficiently well-behaved for maximum-

likelihood factor analysis, so I used a minimum residual (minres)

solution. I chose to rotate the factors with the promax oblique

algorithm, because first-order factors were expected to have

significant correlations with one another. The rotated first-order

factors are given in Table 1 with loadings larger than 0.4 or less

than 20.4. Some of the loadings are greater than one. This is not

unexpected since the factors are oblique and thus the loadings in

Who Shares? Who Doesn’t?

PLoS ONE | www.plosone.org 3 July 2011 | Volume 6 | Issue 7 | e1865733

Page 84: Sharing Clinical Research Data:  an IOM Workshop

the pattern matrix represent regression coefficients rather than

correlations. Correlations between attributes and the first-order

factors are given in the structure matrix in Table S1. The factors

have been named based on the variables they load most heavily,

using abbreviations for publishing in an Open Access journal (OA)

and previously depositing data in the Gene Expression Omnibus

(GEO) or ArrayExpress (AE) databases.

After imputing missing values, I calculated scores for each of the

15 factors for each of the 11,603 data collection studies.

Many of the factor scores demonstrated a correlation with

frequency of data sharing in univariate analysis, as seen in Figure 3.

Several factors seemed to have a linear relationship with data sharing

across their whole range. For example, whereas the data sharing rate

was relatively low for studies with the lowest scores on the factor

related to the citation and collaboration rate of the corresponding

author’s institution (in Figure 3, the first row under the heading

‘‘Institution high citation & collaboration’’), the data sharing rate was

higher for studies that scored within the 25th to 50th percentile on that

factor, higher still for studies the third quartile, and studies from

highly-cited institutions, above the 75th percentile had a relatively

high rate of data sharing. A trend in the opposite direction can be

seen for the factor ‘‘Humans & cancer’’: the higher a study scored on

that factor, the less likely it was to have shared its data.

Most of these factors were significantly associated with data-

sharing behavior in a multivariate logistic regression: p = 0.18 for

‘‘Large NIH grant’’, p,0.05 for ‘‘No GEO reuse & YES high

institution output’’ and ‘‘No K funding or P funding’’, and

p,0.005 for the other first-order factors. The increase in the odds

of data sharing is illustrated in Figure 4 as scores on each factor in

the model are moved from their 25th percentile value to their 75th

percentile value.

Second-order factorsThe heavy correlations between the first-order factors suggested

that second-order factors may be illuminating. Scree plot analysis

of the correlations between the first-order factors suggested a

solution containing five second-order factors. I calculated the

factors using a ‘‘varimax’’ rotation to find orthogonal factors.

Loadings on the first-order factors are given in Table 2.

Since interactions make these second-order variables slightly

difficult to interpret, I followed the method explained by Gorsuch

[51] to calculate the loadings of the second-order variables directly on

the original variables. The results are listed in Table 3. I named the

second-order factors based on the loadings on the original variables.

I then calculated factor scores for each of these second-order

factors using the original attributes of the 11,603 datapoints. In

Figure 1. Proportion of articles with shared datasets, by year (error bars are 95% confidence intervals of the proportions).doi:10.1371/journal.pone.0018657.g001

Who Shares? Who Doesn’t?

PLoS ONE | www.plosone.org 4 July 2011 | Volume 6 | Issue 7 | e1865734

Page 85: Sharing Clinical Research Data:  an IOM Workshop

univariate analysis, scores on several of the five factors showed a

clear linear relationship with data sharing frequency, as illustrated

in Figure 5.

All five of the second-order factors were associated with data

sharing in multivariate logistic regression, p,0.001.The increase

in odds of data sharing is illustrated in Figure 6 as each factor in

the model is moved from its 25th percentile value to its 75th

percentile value.

Finally, to understand which of these factors was most predictive

of data sharing behaviour, I performed supervised hierarchical

clustering using the second-order factors. Splits on ‘‘OA journal &

previous GEO-AE sharing’’ and ‘‘Cancer & Humans’’ were

clearly the most informative, so I simply split these two factors at

their medians and looked at the data sharing prevalence. As shown

in Table 4, studies that scored high on the ‘‘OA journal & previous

GEO-AE sharing’’ factor and low on the ‘‘Cancer & Humans’’

factor were almost three times as likely to share their data as a

‘‘Cancer & Humans’’ study published without a strong ‘‘OA

journal & previous GEO-AE sharing’’ environment.

Discussion

This study explored the association between attributes of a

published experiment and the probability that its raw dataset was

shared in a publicly accessible database. I found that 25% of

studies that performed gene expression microarray experiments

have deposited their raw research data in a primary public

repository. The proportion of studies that shared their gene

expression datasets increased over time, from less than 5% in early

years, before mature standards and repositories, to 30%–35% in

2007–2009. This suggests that perhaps 45% of recent gene

expression studies have made their data available somewhere on

the internet, after accounting for datasets overlooked by the

automated methods of discovery [44]. This estimate is consistent

with a previous manual inventory [15].

Many factors derived from an experiment’s topic, impact,

funding, publishing, institutional, and authorship environments

were associated with the probability of data sharing. In particular,

authors publishing in an open access journal, or with a history of

sharing and reusing shared gene expression microarray data, were

most likely to share their data, and those studying cancer or

human subjects were least likely to share.

It is disheartening to discover that human and cancer studies

have particularly low rates of data sharing. These data are surely

some of the most valuable for reuse, to confirm, refute, inform and

advance bench-to-bedside translational research [52] Further

studies are required to understand the interplay of an investigator’s

motivation, opportunity, and ability to share their raw datasets

Figure 2. Proportion of articles with shared datasets, by journal (error bars are 95% confidence intervals of the proportions).doi:10.1371/journal.pone.0018657.g002

Who Shares? Who Doesn’t?

PLoS ONE | www.plosone.org 5 July 2011 | Volume 6 | Issue 7 | e1865735

Page 86: Sharing Clinical Research Data:  an IOM Workshop

Table 1. First-order factor loadings.

Large NIH grant

0.97 num.post2005.morethan1000k.tr

0.96 num.post2005.morethan750k.tr

0.92 num.post2004.morethan750k.tr

0.91 num.post2004.morethan1000k.tr

0.91 num.post2005.morethan500k.tr

0.89 num.post2006.morethan1000k.tr

0.89 num.post2006.morethan750k.tr

0.86 num.post2004.morethan500k.tr

0.85 num.post2006.morethan500k.tr

0.84 num.post2003.morethan750k.tr

0.84 num.post2003.morethan1000k.tr

0.80 num.post2003.morethan500k.tr

0.74 has.U.funding

0.71 has.P.funding

0.58 nih.sum.avg.dollars.tr

0.56 nih.sum.sum.dollars.tr

0.44 nih.max.max.dollars.tr

Has journal policy

1.00 journal.policy.contains..geo.omnibus

0.95 journal.policy.at.least.requests.sharing.array

0.95 journal.policy.mentions.any.sharing

0.93 journal.policy.contains.word.microarray

0.91 journal.policy.requests.sharing.other.data

0.85 journal.policy.says.must.deposit

0.83 journal.policy.contains.word.arrayexpress

0.72 journal.policy.requires.microarray.accession

0.71 journal.policy.requests.accession

0.58 journal.policy.contains.word.miame.mged

0.48 journal.microarray.creating.count.tr

0.45 journal.policy.mentions.consequences

0.42 journal.policy.general.statement

NOT institution NCI or intramural

0.59 pubmed.is.funded.non.us.govt

0.55 institution.is.higher.ed

20.89 institution.nci

20.86 pubmed.is.funded.nih.intramural

20.42 country.usa

Count of R01 & other NIH grants

1.15 has.R01.funding

1.14 has.R.funding

0.89 num.grants.via.nih.tr

0.86 nih.cumulative.years.tr

0.82 num.grant.numbers.tr

0.80 max.grant.duration.tr

0.66 pubmed.is.funded.nih

0.50 nih.max.max.dollars.tr

0.45 num.nih.is.nigms.tr

0.44 country.usa

0.42 has.T.funding

0.41 num.nih.is.niaid.tr

Journal impact

0.88 journal.5yr.impact.factor.log

0.88 journal.impact.factor.log

0.85 journal.immediacy.index.log

0.70 journal.policy.mentions.exceptions

0.54 journal.num.articles.2008.tr

0.51 journal.policy.contains.word.miame.mged

20.61 journal.policy.contains.word.arrayexpress

20.48 pubmed.is.open.access

Last author num prev pubs & first year pub

0.84 last.author.num.prev.pubs.tr

0.74 last.author.year.first.pub.ago.tr

0.73 last.author.num.prev.pmc.cites.tr

0.68 last.author.num.prev.other.sharing.tr

0.48 country.japan

0.44 last.author.num.prev.microarray.creations.tr

Journal policy consequences & long half-life

0.78 journal.policy.mentions.consequences

0.73 journal.cited.halflife

0.60 pubmed.is.bacteria

0.42 journal.policy.requires.microarray.accession

20.54 pubmed.is.open.access

20.45 journal.policy.general.statement

Institution high citations & collaboration

0.76 institution.mean.norm.citation.score

0.72 institution.international.collaboration

0.64 institution.mean.norm.impact.factor

0.41 country.germany

20.67 country.china

20.61 country.korea

20.56 last.author.gender.not.found

20.43 country.japan

NO geo reuse & YES high institution output

0.66 institution.research.output.tr

0.58 institution.harvard

0.46 has.K.funding

0.42 institution.stanford

20.79 pubmed.is.geo.reuse

20.62 country.australia

20.46 institution.rank

NOT animals or mice

0.51 pubmed.is.humans

0.43 pubmed.is.diagnosis

0.40 pubmed.is.effectiveness

20.93 pubmed.is.animals

20.86 pubmed.is.mice

Humans & cancer

0.84 pubmed.is.humans

0.75 pubmed.is.cancer

0.67 pubmed.is.cultured.cells

0.52 institution.is.medical

Table 1. Cont.

Who Shares? Who Doesn’t?

PLoS ONE | www.plosone.org 6 July 2011 | Volume 6 | Issue 7 | e1865736

Page 87: Sharing Clinical Research Data:  an IOM Workshop

[53,54]. In the mean time, we can make some guesses: As is

appropriate, concerns about privacy of human subjects’ data

undoubtedly affect a researcher’s willingness and ability (perceived

or actual) to share raw study data. I do not presume to recommend

a proper balance between privacy and the societal benefit of data

sharing, but I will emphasize that researchers should assess the

degree of re-identification risk on a study-by-study basis [55],

evaluate the risks and benefits across the wide range of stakeholder

interests [56], and consider an ethical framework to make these

difficult decisions [57]. Learning how to make these decisions well

is difficult: it is vital that we educate and mentor both new and

experienced researchers in best practices. Given the low risk of re-

identification through gene expression microarray data (illustrated

by its inclusion in the Open-Access Data Tier at http://target.

cancer.gov/dataportal/access/policy.asp), data-sharing rates

could also be low for reasons other than privacy. Cancer

researchers may perceive their field as particularly competitive,

or cancer studies may have relatively strong links to industry – two

attributes previously associated with data withholding [58,59].

NIH funding levels were associated with increased prevalence of

data sharing, though the overall probability of sharing remains low

even in well-funded studies. Data sharing was infrequent even in

studies funded by grants clearly covered by the NIH Data Sharing

Policy, such as those that received more than one million dollars

per year and were awarded or renewed since 2006. This result is

consistent with reports that the NIH Data Sharing Policy is often

not taken seriously because compliance is not enforced [54]. It is

surprising how infrequently the NIH Data Sharing Policy applies

to gene expression microarray studies (19% as per a pilot to this

study [60]). The NIH may address these issues soon within its

renewed commitment to make data more available [61].

I am intrigued that publishing in an open access journal, previously

sharing gene expression data, and previously reusing gene expression

data were associated with data sharing outcomes. More research is

required to understand the drivers behind this association. Does the

factor represent an attitude towards ‘‘openness’’ by the decision-

making authors? Does the act of sharing data lower the perceived

effort of sharing data again? Does it dispel fears induced by possible

negative outcomes from sharing data? To what extent does

recognizing the value of shared data through data reuse motivate

an author to share his or her own datasets?

People often wonder whether the attitude towards data sharing

varies with age. Although I was not able to capture author age, I

did estimate the number of years since first and last authors had

published their first paper. The analysis suggests that first authors

with many years in the field are less likely to share data than those

with fewer years of experience, but no such association was found

for last authors. More work is needed to confirm this finding given

the confounding factor of previous data-sharing experience.

Gene expression publications associated with Stanford Univer-

sity have a very high level of data sharing. The true level of open

data archiving is actually much higher than that reflected in this

study: Stanford University hosts a public microarray repository,

and many articles that did not have a dataset link from GEO or

ArrayExpress do mention submission to the Stanford Microarray

Database. If one were looking for a community on which to model

best practices for data sharing adoption, Stanford would be a great

place to start.

Similarly, Physiological Genomics has very high rates of public

archiving relative to other journals. Perhaps not coincidentally, to

my knowledge Physiological Genomics is the only journal to have

published an evaluation of their author’s attitudes and experiences

following the adoption of new data archiving requirements for

gene expression microarray data [21].

Analyzing data sharing through bibliometric and data-mining

attributes has several advantages: We can look at a very large set of

studies and attributes, our results are not biased by survey response

self-selection or reporting bias, and the analysis can be repeated

over time with little additional effort.

However, this approach does suffer its own limitations. Filters

for identifying microarray creation studies do not have perfect

precision, so some non-data-creation studies may be included in

the analysis. Because studies that do not create data will not have

data deposits, their inclusion alters the composition of what I

consider to be studies that create but do not share data.

Furthermore, my method for detecting data deposits overlooks

data deposits that are missing PubMed identifiers in GEO and

ArrayExpress, so the dataset misclassifies some studies that did in

fact share their data in these repositories.

I made decisions to facilitate analysis, such as assuming that

PubMed identifiers were monotonically increasing with publica-

0.47 pubmed.is.core.clinical.journal

20.68 pubmed.is.plants

20.49 pubmed.is.fungi

Institution is government & NOT higher ed

0.92 institution.is.govnt

0.70 country.germany

0.65 country.france

0.46 institution.international.collaboration

20.78 institution.is.higher.ed

20.56 country.canada

20.51 institution.stanford

20.42 institution.is.medical

NO K funding or P funding

0.56 has.R01.funding

0.49 has.R.funding

0.41 num.post2006.morethan500k.tr

0.41 num.post2006.morethan750k.tr

0.40 num.post2006.morethan1000k.tr

20.65 has.K.funding

20.63 has.P.funding

Authors prev GEOAE sharing & OA & arry creation

0.83 last.author.num.prev.geoae.sharing.tr

0.74 last.author.num.prev.microarray.creations.tr

0.73 last.author.num.prev.oa.tr

0.60 first.author.num.prev.geoae.sharing.tr

0.47 first.author.num.prev.oa.tr

0.46 first.author.num.prev.microarray.creations.tr

0.40 institution.stanford

20.44 years.ago.tr

First author num prev pubs & first year pub

0.83 first.author.num.prev.pubs.tr

0.77 first.author.year.first.pub.ago.tr

0.73 first.author.num.prev.pmc.cites.tr

0.52 first.author.num.prev.other.sharing.tr

doi:10.1371/journal.pone.0018657.t001

Table 1. Cont.

Who Shares? Who Doesn’t?

PLoS ONE | www.plosone.org 7 July 2011 | Volume 6 | Issue 7 | e1865737

Page 88: Sharing Clinical Research Data:  an IOM Workshop

tion date and using the current journal data-sharing policy as a

surrogate for the data-sharing policy in place when papers were

published. These decisions may have introduced errors.

Missing data may have obscured important information. For

example, articles published in journals with policies that I did not

examine had a lower rate of data sharing than articles published in

journals whose ‘‘Instructions to Authors’’ policies I did quantify. It

is likely that a more comprehensive analysis of journal data-

sharing policies would provide additional insight. Similarly, the

information on funders was limited: I only included funding data

on NIH grants. Inclusion of more funders would help us

understand the general role of funder policy and funding levels.

Previous work [58] found that investigator gender was

correlated with data withholding. It is important to look at gender

in multivariate analysis since male scientists are more likely than

women to have large NIH grants [62]. Because gender did not

contribute heavily to any of the derived factors in this study,

additional analysis will be necessary to investigate its association

with data sharing behaviour in this dataset. It should be noted that

the source of gender data has limitations. The Baby Name Guesser

algorithm empirically estimates gender by analyzing popular usage

on the internet. Although coverage across names from diverse

ethnicities seems quite good, the algorithm is relatively unsuccess-

ful in determining the gender of Asian names. This may have

Figure 3. Association between shared data and first-order factors. Percentage of studies with shared data is shown for each quartile for eachfactor. Univariate analysis.doi:10.1371/journal.pone.0018657.g003

Who Shares? Who Doesn’t?

PLoS ONE | www.plosone.org 8 July 2011 | Volume 6 | Issue 7 | e1865738

Page 89: Sharing Clinical Research Data:  an IOM Workshop

confounded the gender analysis, and the ‘‘gender not found’’

variable might have served as an unexpected proxy for author

ethnicity.

The Author-ity system provides accurate author publication

histories: A previous evaluation on a different sample found that

only 0.5% of publication histories erroneously included more than

one author, and about 2% of clusters contained a partial inventory

of an author’s publication history due to splitting a given author

across multiple clusters [46]. However, because the lumping does

not occur randomly, my attributes based on author publication

histories may have included some bias. For example, the

documented tendency of Author-ity to erroneously lump common

Japanese names [46] may have confounded the author-history

variables with author-ethnicity and thereby influenced the findings

on first-author age and experience.

In previous work I used h-index and a-index metrics to

measure ‘‘author experience’’ for both the first and last author

[60] (in biomedicine, customarily, the first and last authors

make the largest contributions to a study and have the most

power in publication decisions). A recent paper [63] suggests

that a raw count of number of papers and number of citations

is functionally equivalent to the h-index and a-index, so I used

the raw counts in this study for computational simplicity.

Reliance on citations from PubMed Central (to enable scripted

data collection) meant that older studies and those published in

areas less well represented in PubMed Central were character-

ized by an artificially low citation count.

The large sample of 11,603 studies captured a fairly diverse and

representative subset of gene expression microarray studies, though it

is possible that gene expression microarray studies missed by the full-

text filter differed in significant ways from those that used mainstream

vocabulary to describe their wetlab methods. Selecting a sample

based on queries of non-subscription full-text content may have

introduced a slight bias towards open access journals. It is worth

noting that this study demonstrates the value of open access and open

full-text resources for research evaluation.

In regression studies it is important to remember that associations

do not imply causation. It is possible, for example, that receiving a

Table 2. Second-order factor loadings, by first-order factors.

Amount of NIH funding

0.89 Count of R01 & other NIH grants

0.49 Large NIH grant

20.55 NO K funding or P funding

Cancer & humans

0.83 Humans & cancer

OA journal & previous GEO-AE sharing

0.59 Authors prev GEOAE sharing & OA & microarray creation

0.43 Institution high citations & collaboration

0.31 First author num prev pubs & first year pub

20.36 Last author num prev pubs & first year pub

Journal impact factor and policy

0.57 Journal impact

0.51 Last author num prev pubs & first year pub

Higher Ed in USA

0.40 NO geo reuse+YES high institution output

20.44 Institution is government & NOT higher ed

doi:10.1371/journal.pone.0018657.t002

Table 3. Second-order factor loadings, by original variables.

Amount of NIH funding

0.87 nih.cumulative.years.tr

0.85 num.grants.via.nih.tr

0.84 max.grant.duration.tr

0.82 num.grant.numbers.tr

0.80 pubmed.is.funded.nih

0.79 nih.max.max.dollars.tr

0.70 nih.sum.avg.dollars.tr

0.70 nih.sum.sum.dollars.tr

0.59 has.R.funding

0.59 num.post2003.morethan500k.tr

0.58 country.usa

0.58 has.U.funding

0.57 has.R01.funding

0.55 num.post2003.morethan750k.tr

0.53 has.T.funding

0.53 num.post2003.morethan1000k.tr

0.49 num.post2004.morethan500k.tr

0.45 num.post2004.morethan750k.tr

0.44 has.P.funding

0.43 num.post2004.morethan1000k.tr

0.43 num.nih.is.nci.tr

0.35 num.post2005.morethan500k.tr

0.32 num.nih.is.nigms.tr

0.31 num.post2005.morethan750k.tr

Cancer & humans

0.60 pubmed.is.cancer

0.59 pubmed.is.humans

0.52 pubmed.is.cultured.cells

0.43 pubmed.is.core.clinical.journal

0.39 institution.is.medical

20.58 pubmed.is.plants

20.50 pubmed.is.fungi

20.37 pubmed.is.shared.other

20.30 pubmed.is.bacteria

OA journal & previous GEO-AE sharing

0.40 first.author.num.prev.geoae.sharing.tr

0.37 pubmed.is.open.access

0.37 first.author.num.prev.oa.tr

0.35 last.author.num.prev.geoae.sharing.tr

0.32 pubmed.is.effectiveness

0.32 last.author.num.prev.oa.tr

0.31 pubmed.is.geo.reuse

20.38 country.japan

Journal impact factor and policy

0.48 journal.impact.factor.log

0.47 jour.policy.requires.microarray.accession

0.46 jour.policy.mentions.exceptions

0.46 pubmed.num.cites.from.pmc.tr

0.45 journal.5yr.impact.factor.log

0.45 jour.policy.contains.word.miame.mged

Who Shares? Who Doesn’t?

PLoS ONE | www.plosone.org 9 July 2011 | Volume 6 | Issue 7 | e1865739

Page 90: Sharing Clinical Research Data:  an IOM Workshop

high level of NIH funding and deciding to share data are not

causally related, but rather result from the exposure and excitement

inherent in a ‘‘hot’’ subfield of study.

Importantly, this study did not consider directed sharing,

such as peer-to-peer data exchange or sharing within a

defined collaboration network, and thus underestimates the

amount of data sharing in all its forms. Furthermore, this

study underestimated public sharing of gene expression data

on the Internet. It did not recognize data listed in journal

supplementary information, on lab or personal web sites, or in

institutional or specialized repositories (including the well-

regarded and well-populated Stanford Microarray Database).

Finally, the study methods did not recognize deposits into the

Gene Expression Omnibus or ArrayExpress unless the

database entry was accompanied by a citation to the research

paper, complete with PubMed identifier.

Due to these limitations, care should be taken in interpreting

the estimated levels of absolute data sharing and the data-

sharing status of any particular study listed in the raw data.

More research is needed to attain a deep understanding of

information behaviour around research data sharing, its costs

and benefits to science, society and individual investigators, and

what makes for effective policy.

That said, the results presented here argue for action. Even

in a field with mature policies, repositories and standards,

research data sharing levels are low and increasing only slowly,

and data is least available in areas where it could make the

biggest impact. Let’s learn from those with high rates of sharing

and work to embrace the full potential of our research output.

0.42 last.author.num.prev.pmc.cites.tr

0.41 jour.policy.requests.accession

0.40 journal.immediacy.index.log

0.40 journal.num.articles.2008.tr

0.39 years.ago.tr

0.36 jour.policy.says.must.deposit

0.35 pubmed.num.cites.from.pmc.per.year

0.33 institution.mean.norm.citation.score

0.32 last.author.year.first.pub.ago.tr

0.31 country.usa

0.31 last.author.num.prev.pubs.tr

0.31 jour.policy.contains.word.microarray

20.31 pubmed.is.open.access

Higher Ed in USA

0.36 institution.stanford

0.36 institution.is.higher.ed

0.35 country.usa

0.35 has.R.funding

0.33 has.R01.funding

0.30 institution.harvard

20.37 institution.is.govnt

doi:10.1371/journal.pone.0018657.t003

Table 3. Cont.

Figure 4. Odds ratios of data sharing for first-order factor, multivariate model. Odd ratios are calculated as factor scores are each variedfrom their 25th percentile value to their 75th percentile value. Horizontal lines show the 95% confidence intervals of the odds ratios.doi:10.1371/journal.pone.0018657.g004

Who Shares? Who Doesn’t?

PLoS ONE | www.plosone.org 10 July 2011 | Volume 6 | Issue 7 | e1865740

Page 91: Sharing Clinical Research Data:  an IOM Workshop

Figure 5. Association between shared data and second-order factors. Percentage of studies with shared data is shown for each quartile foreach factor. Univariate analysis.doi:10.1371/journal.pone.0018657.g005

Figure 6. Odds ratios of data sharing for second-order factor, multivariate model. Odd ratios are calculated as factor scores are eachvaried from their 25th percentile value to their 75th percentile value. Horizontal lines show the 95% confidence intervals of the odds ratios.doi:10.1371/journal.pone.0018657.g006

Who Shares? Who Doesn’t?

PLoS ONE | www.plosone.org 11 July 2011 | Volume 6 | Issue 7 | e1865741

Page 92: Sharing Clinical Research Data:  an IOM Workshop

Availability of dataset, statistical scripts, and datacollection source code

Raw data and statistical scripts are available in the Dryad data

repository at doi:10.5061/dryad.mf1sd [64]. Data collection

source code is available at http://github.com/hpiwowar/pypub.

Supporting Information

Figure S1 Associations between shared data and authorattribute variables. Percentage of studies with shared data is

shown for each quartile for continuous variables.

(EPS)

Figure S2 Associations between shared data and journalattribute variables. Percentage of studies with shared data is

shown for each quartile for continuous variables.

(EPS)

Figure S3 Associations between shared data and studyattribute variables. Percentage of studies with shared data is

shown for each quartile for continuous variables.

(EPS)

Figure S4 Associations between shared data and fund-ing attribute variables. Percentage of studies with shared data

is shown for each quartile for continuous variables.

(EPS)

Figure S5 Associations between shared data and coun-try and institution attribute variables. Percentage of studies

with shared data is shown for each quartile for continuous

variables.

(EPS)

Table S1 Structure matrix with correlations between allattributes and first-order factors.(TXT)

Acknowledgments

I sincerely thank my doctoral dissertation advisor, Dr. Wendy Chapman,

for her support and feedback.

Author Contributions

Conceived and designed the experiments: HAP. Performed the experi-

ments: HAP. Analyzed the data: HAP. Contributed reagents/materials/

analysis tools: HAP. Wrote the paper: HAP.

References

1. McCain K (1995) Mandating Sharing: Journal Policies in the Natural Sciences.

Science Communication 16: 403–431.

2. Piwowar H, Chapman W (2008) A review of journal policies for sharing research

data ELPUB, Toronto Canada.

3. National Institutes of Health (NIH) (2003) NOT-OD-03-032: Final NIH

Statement on Sharing Research Data.

4. National Institutes of Health (NIH) (2007) NOT-OD-08-013: Implementation

Guidance and Instructions for Applicants: Policy for Sharing of Data Obtained

in NIH-Supported or Conducted Genome-Wide Association Studies

(GWAS).

5. Nation Science Foundation (NSF) (2011) Grant Proposal Guide, Chapter

II.C.2.j.

6. Fienberg SE, Martin ME, Straf ML (1985) Sharing research data National

Academy Press.

7. Cech TR, Eddy SR, Eisenberg D, Hersey K, Holtzman SH, et al. (2003)

Sharing Publication-Related Data and Materials: Responsibilities of Authorship

in the Life Sciences. Plant Physiol 132: 19–24.

8. (2007) Time for leadership. Nat Biotech 25: 821–821.

9. (2007) Got data? Nat Neurosci 10: 931–931.

10. Kakazu KK, Cheung LW, Lynne W (2004) The Cancer Biomedical Informatics

Grid (caBIG): pioneering an expansive network of information and tools for

collaborative cancer research. Hawaii Med J 63: 273–275.

11. The GAIN Collaborative Research Group (2007) New models of collaboration

in genome-wide association studies: the Genetic Association Information

Network. Nat Genet 39: 1045–1051.

12. Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, et al. (2001)

Minimum information about a microarray experiment (MIAME)-toward

standards for microarray data. Nat Genet 29: 365–371.

13. Barrett T, Troup D, Wilhite S, Ledoux P, Rudnev D, et al. (2007) NCBI GEO:

mining tens of millions of expression profiles–database and tools update. NucleicAcids Res 35.

14. Noor MA, Zimmerman KJ, Teeter KC (2006) Data Sharing: How MuchDoesn’t Get Submitted to GenBank? PLoS Biol 4: e228.

15. Ochsner SA, Steffen DL, Stoeckert CJ, McKenna NJ (2008) Much room for

improvement in deposition rates of expression microarray datasets. NatureMethods 5: 991.

16. Reidpath DD, Allotey PA (2001) Data sharing in medical research: an empiricalinvestigation. Bioethics 15: 125–134.

17. Kyzas P, Loizou K, Ioannidis J (2005) Selective reporting biases in cancer

prognostic factor studies. J Natl Cancer Inst 97: 1043–1055.18. Blumenthal D, Campbell EG, Gokhale M, Yucel R, Clarridge B, et al. (2006)

Data withholding in genetics and the other life sciences: prevalences andpredictors. Acad Med 81: 137–145.

19. Campbell EG, Clarridge BR, Gokhale M, Birenbaum L, Hilgartner S, et al.

(2002) Data withholding in academic genetics: evidence from a national survey.JAMA 287: 473–480.

20. Hedstrom M (2006) Producing Archive-Ready Datasets: Compliance, Incen-tives, and Motivation. IASSIST Conference 2006: Presentations.

21. Ventura B (2005) Mandatory submission of microarray data to publicrepositories: how is it working? Physiol Genomics 20: 153–156.

22. Giordano R (2007) The Scientist: Secretive, Selfish, or Reticent? A Social

Network Analysis. E-Social Science.23. Hedstrom M, Niu J (2008) Research Forum Presentation: Incentives to Create

‘‘Archive-Ready’’ Data: Implications for Archives and Records Management.Society of American Archivists Annual Meeting.

24. Niu J (2006) Incentive study for research data sharing. A case study on NIJ

grantees.

Table 4. Data sharing prevalence of subgroups divided at medians of two second-order factors [95% confidence intervals].

number of studies with shareddata/ number of studies

Above the median value for thefactor ‘‘Cancer & Humans’’

Below the median value for the factor‘‘Cancer & Humans’’ Total

Above the median value forthe factor ‘‘OA and previousGEO-AE sharing’’

629/2624 = 24% [22%, 26%] 1184/3178 = 37% [36%, 39%] 1813/5802 = 31% [30%, 32%]

Below the median value forthe factor ‘‘OA and previousGEO-AE sharing’’

440/3178 = 14% [13%, 15%] 648/2623 = 25% [23%, 26%] 1088/5801 = 19% [18%, 20%]

Total 1069/5802 = 18% [17%, 19%] 1832/5801 = 32% [30%, 33%] 2901/11603 = 25% [24%, 26%]

doi:10.1371/journal.pone.0018657.t004

Who Shares? Who Doesn’t?

PLoS ONE | www.plosone.org 12 July 2011 | Volume 6 | Issue 7 | e1865742

Page 93: Sharing Clinical Research Data:  an IOM Workshop

25. Lowrance W Access to Collections of Data and Materials for Heath Research: A

report to the Medical Research Council and the Wellcome Trust.26. University of Nottingham JULIET: Research funders’ open access policies.

27. Brown C (2003) The changing face of scientific discourse: Analysis of genomic

and proteomic database usage and acceptance. Journal of the American Societyfor Information Science and Technology 54: 926–938.

28. McCullough BD, McGeary KA, Harrison TD (2008) Do Economics JournalArchives Promote Replicable Research? Canadian Journal of Economics 41:

1406–1420.

29. Constant D, Kiesler S, Sproull L (1994) What’s mine is ours, or is it? A study ofattitudes about information sharing. Information Systems Research 5: 400–421.

30. Matzler K, Renzl B, Muller J, Herting S, Mooradian T (2008) Personality traitsand knowledge sharing. Journal of Economic Psychology 29: 301–313.

31. Ryu S, Ho SH, Han I (2003) Knowledge sharing behavior of physicians inhospitals. Expert Systems With Applications 25: 113–122.

32. Bitzer J, Schrettl W, Schroder PJH (2007) Intrinsic motivation in open source

software development. Journal of Comparative Economics 35: 160–169.33. Kim J (2007) Motivating and Impeding Factors Affecting Faculty Contribution

to Institutional Repositories. Journal of Digital Information 8: 2.34. Seonghee K, Boryung J (2008) An analysis of faculty perceptions: Attitudes

toward knowledge sharing and collaboration in an academic institution. Library

30: 282–290.35. Warlick S, Vaughan K (2007) Factors influencing publication choice: why

faculty choose open access. Biomed Digit Libr 4: 1.36. Lee C, Dourish P, Mark G (2006) The human infrastructure of cyberinfras-

tructure. Proceedings of the 2006 20th anniversary conference on Computersupported cooperative work. Banff, Canada.

37. Kuo F, Young M (2008) A study of the intention–action gap in knowledge

sharing practices. Journal of the American Society for Information Science andTechnology 59: 1224–1237.

38. Rhodes DR, Yu J, Shanker K, Deshpande N, Varambally R, et al. (2004) Large-scale meta-analysis of cancer microarray data identifies common transcriptional

profiles of neoplastic transformation and progression. Proc Natl Acad Sci U S A

101: 9309–9314.39. Hrynaszkiewicz I, Altman D (2009) Towards agreement on best practice for

publishing raw clinical trial data. Trials 10: 17.40. Parkinson H, Kapushesky M, Shojatalab M, Abeygunawardena N, Coulson R,

et al. (2007) ArrayExpress–a public database of microarray experiments andgene expression profiles. Nucleic Acids Res 35(Database issue): D747–D750.

41. Ball CA, Brazma A, Causton H, Chervitz S, Edgar R, et al. (2004) Submission of

microarray data to public repositories. PLoS Biol 2: e317.42. (2002) Microarray standards at last. Nature 419: 323.

43. Piwowar HA (2010) Foundational studies for measuring the impact, prevalence,and patterns of publicly sharing biomedical research data. Doctoral Dissertation:

University of Pittsburgh.

44. Piwowar H, Chapman W (2010) Recall and bias of retrieving gene expressionmicroarray datasets through PubMed identifiers. J Biomed Discov Collab 5:

7–20.

45. Yu W, Yesupriya A, Wulf A, Qu J, Gwinn M, et al. (2007) An automatic method

to generate domain-specific investigator networks using PubMed abstracts. BMC

medical informatics and decision making 7: 17.

46. Torvik V, Smalheiser NR (2009) Author Name Disambiguation in MEDLINE.

Transactions on Knowledge Discovery from Data. pp 1–37.

47. Bird S, Loper E (2006) Natural Language Toolkit. Available: http://nltk.

sourceforge.net/.

48. R Development Core Team (2008) R: A Language and Environment for

Statistical Computing. Vienna, Austria: ISBN 3-900051-07-0.

49. Theus M, Urbanek S (2008) Interactive Graphics for Data Analysis: Principles

and Examples (Computer Science and Data Analysis) Chapman & Hall/CRC.

50. Harrell FE (2001) Regression Modeling Strategies Springer.

51. Gorsuch RL (1983) Factor Analysis, Second Edition Psychology Press.

52. Vickers AJ (2008 January 22) Cancer Data? Sorry, Can’t Have It The New York

Times.

53. Siemsen E, Roth A, Balasubramanian S (2008) How motivation, opportunity,

and ability drive knowledge sharing: The constraining-factor model. Journal of

Operations Management 26: 426–445.

54. Tucker J (2009) Motivating Subjects: Data Sharing in Cancer Research [PhD

dissertation.] Virginia Polytechnic Institute and State University.

55. Malin B, Karp D, Scheuermann RH (2010) Technical and policy approaches to

balancing patient privacy and data sharing in clinical and translational research.

J Investig Med 58: 11–18.

56. Foster M, Sharp R (2007) Share and share alike: deciding how to distribute the

scientific and social benefits of genomic data. Nat Rev Genet 8: 633–639.

57. Navarro R (2008) An ethical framework for sharing patient data without

consent. Inform Prim Care 16: 257–262.

58. Blumenthal D, Campbell E, Anderson M, Causino N, Louis K (1997)

Withholding research results in academic life science. Evidence from a national

survey of faculty. JAMA 277: 1224–1228.

59. Vogeli C, Yucel R, Bendavid E, Jones L, Anderson M, et al. (2006) Data

withholding and the next generation of scientists: results of a national survey.

Acad Med 81: 128–136.

60. Piwowar HA, Chapman WW (2010) Public Sharing of Research Datasets: A

Pilot Study of Associations. Journal of Informetrics 4: 148–156.

61. Wellcome Trust (2010) Sharing research data to improve public health: full joint

statement by funders of health research.

62. Hosek SD, Cox AG, Ghosh-Dastidar B, Kofner A, Ramphal N, et al. (2005)

Gender Differences in Major Federal External Grant Programs RAND

Corporation.

63. Bornmann L, Mutz R, Daniel H-D (2009) Do we need the h index and its

variants in addition to standard bibliometric measures? Journal of the American

Society for Information Science and Technology 60: 1286–1289.

64. Piwowar HA (2011) Data From: Who Shares? Who Doesn’t? Factors Associated

with Openly Archiving Raw Research Data. Dryad Digital Repository.

Available doi:10.5061/dryad.mf1sd.

Who Shares? Who Doesn’t?

PLoS ONE | www.plosone.org 13 July 2011 | Volume 6 | Issue 7 | e1865743

Page 94: Sharing Clinical Research Data:  an IOM Workshop

Special article

T h e n e w e ng l a nd j o u r na l o f m e dic i n e

n engl j med 358;3 www.nejm.org january 17, 2008252

Selective Publication of Antidepressant Trials and Its Influence on Apparent Efficacy

Erick H. Turner, M.D., Annette M. Matthews, M.D., Eftihia Linardatos, B.S., Robert A. Tell, L.C.S.W., and Robert Rosenthal, Ph.D.

From the Departments of Psychiatry (E.H.T., A.M.M.) and Pharmacology (E.H.T.), Oregon Health and Science Uni-versity; and the Behavioral Health and Neurosciences Division, Portland Veter-ans Affairs Medical Center (E.H.T., A.M.M., R.A.T.) — both in Portland, OR; the De-partment of Psychology, Kent State Uni-versity, Kent, OH (E.L.); the Department of Psychology, University of California–Riverside, Riverside (R.R.); and Harvard University, Cambridge, MA (R.R.). Address reprint requests to Dr. Turner at Portland VA Medical Center, P3MHDC, 3710 SW US Veterans Hospital Rd., Portland, OR 97239, or at [email protected].

N Engl J Med 2008;358:252-60.Copyright © 2008 Massachusetts Medical Society.

A BS TR AC T

BACKGROUND

Evidence-based medicine is valuable to the extent that the evidence base is complete and unbiased. Selective publication of clinical trials — and the outcomes within those trials — can lead to unrealistic estimates of drug effectiveness and alter the apparent risk–benefit ratio.

METHODS

We obtained reviews from the Food and Drug Administration (FDA) for studies of 12 antidepressant agents involving 12,564 patients. We conducted a systematic lit-erature search to identify matching publications. For trials that were reported in the literature, we compared the published outcomes with the FDA outcomes. We also compared the effect size derived from the published reports with the effect size de-rived from the entire FDA data set.

RESULTS

Among 74 FDA-registered studies, 31%, accounting for 3449 study participants, were not published. Whether and how the studies were published were associated with the study outcome. A total of 37 studies viewed by the FDA as having positive results were published; 1 study viewed as positive was not published. Studies viewed by the FDA as having negative or questionable results were, with 3 exceptions, either not published (22 studies) or published in a way that, in our opinion, conveyed a posi-tive outcome (11 studies). According to the published literature, it appeared that 94% of the trials conducted were positive. By contrast, the FDA analysis showed that 51% were positive. Separate meta-analyses of the FDA and journal data sets showed that the increase in effect size ranged from 11 to 69% for individual drugs and was 32% overall.

CONCLUSIONS

We cannot determine whether the bias observed resulted from a failure to submit manuscripts on the part of authors and sponsors, from decisions by journal editors and reviewers not to publish, or both. Selective reporting of clinical trial results may have adverse consequences for researchers, study participants, health care profes-sionals, and patients.

The New England Journal of Medicine Downloaded from nejm.org at THE NATIONAL ACADEMIES on July 12, 2012. For personal use only. No other uses without permission.

Copyright © 2008 Massachusetts Medical Society. All rights reserved. 44

Page 95: Sharing Clinical Research Data:  an IOM Workshop

Selective Publication of Antidepressant Trials

n engl j med 358;3 www.nejm.org january 17, 2008 253

Medical decisions are based on an understanding of publicly reported clin-ical trials.1,2 If the evidence base is bi-

ased, then decisions based on this evidence may not be the optimal decisions. For example, selec-tive publication of clinical trials, and the outcomes within those trials, can lead to unrealistic estimates of drug effectiveness and alter the apparent risk–benefit ratio.3,4

Attempts to study selective publication are com-plicated by the unavailability of data from unpub-lished trials. Researchers have found evidence for selective publication by comparing the results of published trials with information from surveys of authors,5 registries,6 institutional review boards,7,8 and funding agencies,9,10 and even with published methods.11 Numerous tests are available to detect selective-reporting bias, but none are known to be capable of detecting or ruling out bias reliably.12‑16

In the United States, the Food and Drug Ad-ministration (FDA) operates a registry and a re-sults database.17 Drug companies must register with the FDA all trials they intend to use in sup-port of an application for marketing approval or a change in labeling. The FDA uses this informa-tion to create a table of all studies.18 The study protocols in the database must prospectively iden-tify the exact methods that will be used to collect and analyze data. Afterward, in their marketing application, sponsors must report the results ob-tained using the prespecified methods. These submissions include raw data, which FDA statis-ticians use in corroborative analyses. This system prevents selective post hoc reporting of favorable trial results and outcomes within those trials.

How accurately does the published literature convey data on drug efficacy to the medical com-munity? To address this question, we compared drug efficacy inferred from the published litera-ture with drug efficacy according to FDA reviews.

Me thods

Data from FDA Reviews

We identified the phase 2 and 3 clinical-trial pro-grams for 12 antidepressant agents approved by the FDA between 1987 and 2004 (median, August 1996), involving 12,564 adult patients. For the eight older antidepressants, we obtained hard copies of statistical and medical reviews from colleagues who had procured them through the Freedom of

Information Act.19 Reviews for the four newer antidepressants were available on the FDA Web site.17,20 This study was approved by the Research and Development Committee of the Portland Vet-erans Affairs Medical Center; because of its na-ture, informed consent from individual patients was not required.

From the FDA reviews of submitted clinical tri-als, we extracted efficacy data on all randomized, double-blind, placebo-controlled studies of drugs for the short-term treatment of depression. We in-cluded data pertaining only to dosages later ap-proved as safe and effective; data pertaining to unapproved dosages were excluded.

We extracted the FDA’s regulatory decisions — that is, whether, for purposes of approval, the studies were judged to be positive or negative with respect to the prespecified primary outcomes (or primary end points).21 We classified as question-able those studies that the FDA judged to be nei-ther positive nor clearly negative — that is, stud-ies that did not have significant findings on the primary outcome but did have significant findings on several secondary outcomes. Failed studies22 were also classified as questionable (for more in-formation, see the Methods section of the Supple-mentary Appendix, available with the full text of this article at www.nejm.org). For fixed-dose stud-ies (studies in which patients are randomly as-signed to receive one of two or more dose levels or placebo) with a mix of significant and nonsig-nificant results for different doses, we used the FDA’s stated overall decisions on the studies. We used double data extraction and entry, as detailed in the Methods section of the Supplementary Ap-pendix.

Data from Journal Articles

Our literature-search strategy consisted of the following steps: a search of articles in PubMed, a search of references listed in review articles, and a search of the Cochrane Central Register of Con-trolled Trials; contact by telephone or e-mail with the drug sponsor’s medical-information depart-ment; and finally, contact by means of a certified letter sent to the sponsor’s medical-information department, including a deadline for responding in writing to our query about whether the study results had been published. If these steps failed to reveal any publications, we concluded that the study results had not been published.

The New England Journal of Medicine Downloaded from nejm.org at THE NATIONAL ACADEMIES on July 12, 2012. For personal use only. No other uses without permission.

Copyright © 2008 Massachusetts Medical Society. All rights reserved. 45

Page 96: Sharing Clinical Research Data:  an IOM Workshop

T h e n e w e ng l a nd j o u r na l o f m e dic i n e

n engl j med 358;3 www.nejm.org january 17, 2008254

We identified the best match between the FDA-reviewed clinical trials and journal articles on the basis of the following information: drug name, dose groups, sample size, active comparator (if used), duration, and name of principal investiga-tor. We sought published reports on individual studies; articles covering multiple studies were ex-cluded. When the results of a trial were reported in two or more primary publications, we selected the first publication.

Few journal articles used the term “primary ef-ficacy outcome” or a reasonable equivalent. There-fore, we identified the apparent primary efficacy outcome, or the result highlighted most promi-nently, as the drug–placebo comparison reported first in the text of the results section or in the table or figure first cited in the text. As with the FDA reviews, we used double data extraction and entry (see the Methods section of the Supplementary Ap-pendix for details).

Statistical Analysis

We categorized the trials on the basis of the FDA regulatory decision, whether the trial results were published, and whether the apparent primary out-comes agreed or conflicted with the FDA decision. We calculated risk ratios with exact 95% confi-dence intervals and Pearson’s chi-square analysis, using Stata software, version 9. We used a simi-lar approach to examine the numbers of patients within the studies. Sample sizes were compared between published and unpublished studies with the use of the Wilcoxon rank-sum test.

For our major outcome indicator, we calculat-ed the effect size for each trial using Hedges’s g — that is, the difference between two means di-vided by their pooled standard deviation.23 How-ever, because means and standard deviations (or standard errors) were inconsistently reported in both the FDA reviews and the journal articles, we used the algebraically equivalent computational equation24:

We calculated the t statistic25 using the precise P value and the combined sample size as argu-ments in Microsoft Excel’s TINV (inverse T) func-tion, multiplying t by −1 when the study drug was inferior to the placebo. Hedges’s correction for small sample size was applied to all g values.26

Precise P values were not always available for

the above calculation. Rather, P values were often indicated as being below or above a certain thresh-old — for example, P<0.05 or “not significant” (i.e., P>0.05). In these cases, we followed the pro-cedure described in the Supplementary Appendix.

For each fixed-dose (multiple-dose) study, we computed a single study-level effect size weighted by the degrees of freedom for each dose group. On the basis of the study-level effect-size values for both fixed-dose and flexible-dose studies, we calculated weighted mean effect-size values for each drug and for all drugs combined, using a random-effects model with the method of Der-Simonian and Laird27 in Stata.28

Within the published studies, we compared the effect-size values derived from the journal articles with the corresponding effect-size values derived from the FDA reviews. Next, within the FDA data set, we compared the effect-size values for the published studies with the effect-size values for the unpublished studies. Finally, we compared the journal-based effect-size values with those derived from the entire FDA data set — that is, both pub-lished and unpublished studies.

We made these comparisons at the level of studies and again at the level of the 12 drugs. Be-cause the data were not normally distributed, we used the nonparametric rank-sum test for un-paired data and the signed-rank test for paired data. In these analyses, all the effect-size values were given equal weight.

R esult s

Study Outcome and Publication Status

Of the 74 FDA-registered studies in the analysis we could not find evidence of publication for 23 (31%) (Table 1). The difference between the sam-ple sizes for the published studies (median, 153 patients) and the unpublished studies (median, 146 patients) was neither large nor significant (5% difference between medians; P = 0.29 by the rank-sum test).

The data in Table 1 are displayed in terms of the study outcome in Figure 1A. The questions of whether the studies were published and, if so, how the results were reported were strongly related to their overall outcomes. The FDA deemed 38 of the 74 studies (51%) positive, and all but 1 of the 38 were published. The remaining 36 studies (49%) were deemed to be either negative (24 studies) or questionable (12). Of these 36 studies, 3 were pub-

√g = t × 1ndrug

1nplacebo

+ .

The New England Journal of Medicine Downloaded from nejm.org at THE NATIONAL ACADEMIES on July 12, 2012. For personal use only. No other uses without permission.

Copyright © 2008 Massachusetts Medical Society. All rights reserved. 46

Page 97: Sharing Clinical Research Data:  an IOM Workshop

Selective Publication of Antidepressant Trials

n engl j med 358;3 www.nejm.org january 17, 2008 255

lished as not positive, whereas the remaining 33 either were not published (22 studies) or were pub-lished, in our opinion, as positive (11) and therefore conflicted with the FDA’s conclusion. Overall, the studies that the FDA judged as positive were ap-proximately 12 times as likely to be published in a way that agreed with the FDA analysis as were studies with nonpositive results according to the FDA (risk ratio, 11.7; 95% confidence interval [CI], 6.2 to 22.0; P<0.001). This association of publi-cation status with study outcome remained sig-nificant when we excluded questionable studies and when we examined publication status with-out regard to whether the published conclusions and the FDA conclusions were in agreement (for details, see the Supplementary Appendix).

Overall, 48 of the 51 published studies were reported to have positive results (94%; binomial 95% CI, 84 to 99). According to the FDA, 38 of the 74 registered studies had positive results (51%; 95% CI, 39 to 63). There was no overlap between these two sets of confidence intervals.

These data are broken down by drug and study number in Figure 2A. For each of the 12 drugs, the results of at least one study either were unpub-lished or were reported in the literature as posi-tive despite a conflicting judgment by the FDA.

Number of Study Participants

As shown in Table 1, a total of 12,564 patients participated in these trials. The data from 3449 patients (27%) were not published. Data from an additional 1843 patients (15%) were reported in journal articles in which the highlighted finding conflicted with the FDA-defined primary outcome. Thus, the percentages for the patients closely mir-rored those for the studies (Table 1).

Whether a patient’s data were reported in a way that was in concert with the FDA review was as-sociated with the study outcome (Fig. 1B) (risk ratio, 27.1), which was consistent with the above-reported finding with the studies. Figure 2B shows these same data according to the drug being evaluated.

Qualitative Description of Selective Reporting within Trials

The methods reported in 11 journal articles ap-pear to depart from the prespecified methods re-flected in the FDA reviews (Table B of the Supple-mentary Appendix). Although for each of these studies the finding with respect to the protocol-

specified primary outcome was nonsignificant, each publication highlighted a positive result as if it were the primary outcome. The nonsignificant results for the prespecified primary outcomes were either subordinated to nonprimary positive re-sults (in two reports) or omitted (in nine). (Study-level methodologic differences are detailed in the footnotes to Table B of the Supplementary Ap-pendix.)

Effect Size

The effect-size values derived from the journal re-ports were often greater than those derived from the FDA reviews. The difference between these two sets of values was significant whether the studies (P = 0.003) or the drugs (P = 0.012) were used as the units of analysis (see Table D in the Supple-mentary Appendix).

The effect sizes of the published and unpub-lished studies reviewed by the FDA are compared in Figure 3A. The overall mean weighted effect-size value was 0.37 (95% CI, 0.33 to 0.41) for pub-lished studies and 0.15 (95% CI, 0.08 to 0.22) for unpublished studies. The difference was signifi-cant whether the studies (P<0.001) or the drugs (P = 0.005) were used as the units of analysis (Table D in the Supplementary Appendix).

The mean effect-size values for all FDA stud-ies, both published and unpublished, are com-pared with those for all published studies, as shown in Figure 3B. Again, the differences were significant whether the studies (P<0.001) or the drugs (P = 0.002) were used as units of analysis (Table D in the Supplementary Appendix).

For each of the 12 drugs, the effect size derived from the journal articles exceeded the effect size derived from the FDA reviews (sign test, P<0.001) (Fig. 3B). The magnitude of the increases in effect size between the FDA reviews and the published reports ranged from 11 to 69%, with a median

Table 1. Overall Publication Status of FDA-Registered Antidepressant Studies.

Publication StatusNo. of

Studies (%)No. of Patients in Studies (%)

Published results agree with FDA decision

40 (54) 7,272 (58)

Published results conflict with FDA decision (published as positive)

11 (15) 1,843 (15)

Results not published 23 (31) 3,449 (27)

Total 74 (100) 12,564 (100)

The New England Journal of Medicine Downloaded from nejm.org at THE NATIONAL ACADEMIES on July 12, 2012. For personal use only. No other uses without permission.

Copyright © 2008 Massachusetts Medical Society. All rights reserved. 47

Page 98: Sharing Clinical Research Data:  an IOM Workshop

T h e n e w e ng l a nd j o u r na l o f m e dic i n e

n engl j med 358;3 www.nejm.org january 17, 2008256

increase of 32%. A 32% increase was also ob-served in the weighted mean effect size for all drugs combined, from 0.31 (95% CI, 0.27 to 0.35) to 0.41 (95% CI, 0.36 to 0.45).

A list of the study-level effect-size values used in the above analyses — derived from both the FDA reviews and the published reports — is pro-vided in Table C of the Supplementary Appendix. These effect-size values are based on P values and sample sizes shown in Table A of the Supplemen-tary Appendix, which also lists reference infor-mation for the publications consulted.

Discussion

We found a bias toward the publication of posi-tive results. Not only were positive results more likely to be published, but studies that were not positive, in our opinion, were often published in a way that conveyed a positive outcome. We ana-lyzed these data in terms of the proportion of positive studies and in terms of the effect size associated with drug treatment. Using both ap-proaches, we found that the efficacy of this drug class is less than would be gleaned from an ex-amination of the published literature alone. Ac-cording to the published literature, the results of nearly all of the trials of antidepressants were positive. In contrast, FDA analysis of the trial data showed that roughly half of the trials had positive results. The statistical significance of a study’s re-sults was strongly associated with whether and how they were reported, and the association was independent of sample size. The study outcome also affected the chances that the data from a par-ticipant would be published. As a result of selec-

16p6

Published, agrees with FDA decisionPublished, conflicts with FDA decisionNot published

Positive(N=38)

Questionable(N=12)

Negative(N=24)

Positive(N=7155)

Questionable(N=2309)

Negative(N=3100)

B Patients in Studies (N=12,564)

A Studies (N=74)

0 10 20 30 40

FDA Decision

No. of Studies

0 2000 4000 6000

FDA Decision

No. of Patients

AUTHOR:

FIGURE:

JOB:

4-CH/T

RETAKE

SIZE

ICM

CASE

EMail LineH/TCombo

Revised

AUTHOR, PLEASE NOTE: Figure has been redrawn and type has been reset.

Please check carefully.

REG F

Enon

1st2nd

3rd

Turner

1 of 3

01-17-08

ARTIST: ts

35803 ISSUE:

37(97%)

1(3%)

6(50%)

7075(99%)

1180(51%)

663(21%)

197(6%)

2240(72%)

1129(49%)

3(12%)

80(1%)

5(21%)

6(50%)

16(67%)

Figure 1. Effect of FDA Regulatory Decisions on Publication.

Among the 74 studies reviewed by the FDA (Panel A), 38 were deemed to have positive results, 37 of which were published with positive results; the remaining study was not published. Among the studies deemed to have questionable or negative results by the FDA, there was a tendency toward nonpublication or publi-cation with positive results, conflicting with the con-clusion of the FDA. Among the 12,564 patients in all 74 studies (Panel B), data for patients who participated in studies deemed positive by the FDA were very likely to be published in a way that agreed with the FDA. In contrast, data for patients participating in studies deemed questionable or negative by the FDA tended either not to be published or to be published in a way that conflicted with the FDA’s judgment.

Figure 2 (facing page). Publication Status and FDA Regulatory Decision by Study and by Drug.

Panel A shows the publication status of individual studies. Nearly every study deemed positive by the FDA (top row) was published in a way that agreed with the FDA’s judgment. By contrast, most studies deemed negative (bottom row) or questionable (mid-dle row) by the FDA either were published in a way that conflicted with the FDA’s judgment or were not pub-lished. Numbers shown in boxes indicate individual studies and correspond to the study numbers listed in Table A of the Supplementary Appendix. Panel B shows the numbers of patients participating in the individual studies indicated in Panel A. Data for pa-tients who participated in studies deemed positive by the FDA were very likely to be published in a way that agreed with the FDA’s judgment. By contrast, data for patients who participated in studies deemed negative or questionable by the FDA tended either not to be published or to be published in a way that conflicted with the FDA’s judgment.

The New England Journal of Medicine Downloaded from nejm.org at THE NATIONAL ACADEMIES on July 12, 2012. For personal use only. No other uses without permission.

Copyright © 2008 Massachusetts Medical Society. All rights reserved. 48

Page 99: Sharing Clinical Research Data:  an IOM Workshop

Selective Publication of Antidepressant Trials

n engl j med 358;3 www.nejm.org january 17, 2008 25733p9

Bupropion SR

(Well

butri

n SR, Glax

oSmith

Kline)

Citalopra

m

(Cele

xa, F

orest)

Duloxetin

e

(Cym

balta

, Eli L

illy)

Escita

lopram

(Lex

apro

, Fore

st)

Fluoxe

tine

(Pro

zac,

Eli Lilly

)

Mirt

azap

ine

(Rem

eron, O

rgan

on)

Nefazo

done

(Ser

zone,

Bristo

l-Mye

rs Squ

ibb)

Paroxe

tine

(Pax

il, Glax

oSmith

Kline)

Paroxe

tine C

R

(Pax

il CR, G

laxoSm

ithKlin

e)

Sertra

line

(Zoloft,

Pfiz

er)

Venlaf

axine

(Effe

xor,

Wye

th)

Venlaf

axine X

R

(Effe

xor X

R, Wye

th)

Bupropion SR

(Well

butri

n SR, Glax

oSmith

Kline)

Citalopra

m

(Cele

xa, F

orest)

Duloxetin

e

(Cym

balta

, Eli L

illy)

Escita

lopram

(Lex

apro

, Fore

st)

Fluoxe

tine

(Pro

zac,

Eli Lilly

)

Mirt

azap

ine

(Rem

eron, O

rgan

on)

Nefazo

done

(Ser

zone,

Bristo

l-Mye

rs Squ

ibb)

Paroxe

tine

(Pax

il, Glax

oSmith

Kline)

Paroxe

tine C

R

(Pax

il CR, G

laxoSm

ithKlin

e)

Sertra

line

(Zoloft,

Pfiz

er)

Venlaf

axine

(Effe

xor,

Wye

th)

Venlaf

axine X

R

(Effe

xor X

R, Wye

th)

B Patients in Studies

A Studies

FDA Decision

FDA Decision

Positive

Questionable

Negative

AUTHOR:

FIGURE:

JOB:

4-CH/T

RETAKE

SIZE

ICM

CASE

EMail LineH/TCombo

Revised

AUTHOR, PLEASE NOTE: Figure has been redrawn and type has been reset.

Please check carefully.

REG F

Enon

1st2nd3rd

Turner

2 of 3

01-17-08

ARTIST: ts

35803 ISSUE:

Positive

Questionable

Negative

4243

26 449 27 45 6610 28 46 67

4 29 47 68 721 19 24 30 48 69 73

5152

31 5332 54

6 33 552 7 34 56 643 25 35 57 71

131415 4916 20 39 50 60 70 74

185

N=419

N=158

N=241

N=514

N=185

N=272

N=1046

N=249

N=230

N=627

N=548

N=81

N=80

N=501

N=107

N=428

N=187

N=1100

N=42

N=1059

N=408

N=413

N=80

N=383

N=90N=125

N=299

N=283

N=347

N=387

N=698

N=148

N=224

N=367

N=238

Published, agrees with FDA Not publishedPublished, conflicts with FDA

125

231811221721

3637

38

58

6159

62

84041 65

63

The New England Journal of Medicine Downloaded from nejm.org at THE NATIONAL ACADEMIES on July 12, 2012. For personal use only. No other uses without permission.

Copyright © 2008 Massachusetts Medical Society. All rights reserved. 49

Page 100: Sharing Clinical Research Data:  an IOM Workshop

T h e n e w e ng l a nd j o u r na l o f m e dic i n e

n engl j med 358;3 www.nejm.org january 17, 2008258

tive reporting, the published literature conveyed an effect size nearly one third larger than the ef-fect size derived from the FDA data.

Previous studies have examined the risk–ben-efit ratio for drugs after combining data from regulatory authorities with data published in jour-

nals.3,30‑32 We built on this approach by compar-ing study-level data from the FDA with matched data from journal articles. This comparative ap-proach allowed us to quantify the effect of selec-tive publication on apparent drug efficacy.

Our findings have several limitations: they are

36p6

Unpublished Published FDA Journals

−0.2 0.0 0.2 0.5

FDA-Based Effect Size

g

B Overall Effect SizeA FDA-Based Effect Size

AUTHOR:

FIGURE:

JOB:

4-CH/T

RETAKE

SIZE

ICM

CASE

EMail LineH/TCombo

Revised

AUTHOR, PLEASE NOTE: Figure has been redrawn and type has been reset.

Please check carefully.

REG F

Enon

1st2nd

3rd

Turner

3 of 3

01-17-08

ARTIST: ts

35803 ISSUE:

Bupropion SR (Wellbutrin SR,

GlaxoSmithKline)

Citalopram(Celexa, Forest)

Duloxetine(Cymbalta, Eli Lilly)

Escitalopram(Lexapro, Forest)

Fluoxetine(Prozac, Eli Lilly)

Mirtazapine(Remeron, Organon)

Nefazodone(Serzone, Bristol-Myers

Squibb)

Paroxetine(Paxil, GlaxoSmithKline)

Paroxetine CR(Paxil CR, GlaxoSmithKline)

Sertraline(Zoloft, Pfizer)

Venlafaxine(Effexor, Wyeth)

Venlafaxine XR(Effexor XR, Wyeth)

Overall mean weightedeffect size

0.0 0.2 0.5

Overall Effect Size

gChange

in g

Bupropion SR (Wellbutrin SR,

GlaxoSmithKline)

Citalopram(Celexa, Forest)

Duloxetine(Cymbalta, Eli Lilly)

Escitalopram(Lexapro, Forest)

Fluoxetine(Prozac, Eli Lilly)

Mirtazapine(Remeron, Organon)

Nefazodone(Serzone, Bristol-Myers

Squibb)

Paroxetine(Paxil, GlaxoSmithKline)

Paroxetine CR(Paxil CR, GlaxoSmithKline)

Sertraline(Zoloft, Pfizer)

Venlafaxine(Effexor, Wyeth)

Venlafaxine XR(Effexor XR, Wyeth)

Overall mean weightedeffect size

0.140.27

0.010.31

0.150.34

0.150.35

0.26

0.190.45

0.090.33

0.200.55

0.32

0.180.30

0.110.45

0.190.52

0.150.37

0.170.27

0.240.30

0.300.40

0.310.36

0.260.29

0.350.57

0.260.44

0.420.59

0.320.36

0.260.42

0.400.51

0.400.51

0.31

+55%

+25%

+33%

+16%

+14%

+61%

+69%

+40%

+11%

+64%

+28%

+27%

+32%0.41

Figure 3. Mean Weighted Effect Size According to Drug, Publication Status, and Data Source.

Values for effect size are expressed as Hedges’s g (the difference between two means divided by their pooled standard deviation). Effect-size values of 0.2 and 0.5 are considered to be small and medium, respectively.29 Effect-size values for unpublished studies and published studies, as extracted from data in FDA reviews, are shown in Panel A. Horizontal lines indicate 95% confidence intervals. There were no un-published studies for controlled-release paroxetine or fluoxetine. For each of the other antidepressants, the effect size for the published subgroup of studies was greater than the effect size for the unpublished subgroup of studies. Overall effect-size values (i.e., based on data from the FDA for published and unpublished studies combined), as compared with effect-size values based on data from corresponding published reports, are shown in Panel B. For each drug, the effect-size value based on published literature was higher than the effect-size value based on FDA data, with increases ranging from 11 to 69%. For the entire drug class, effect sizes increased by 32%.

The New England Journal of Medicine Downloaded from nejm.org at THE NATIONAL ACADEMIES on July 12, 2012. For personal use only. No other uses without permission.

Copyright © 2008 Massachusetts Medical Society. All rights reserved. 50

Page 101: Sharing Clinical Research Data:  an IOM Workshop

Selective Publication of Antidepressant Trials

n engl j med 358;3 www.nejm.org january 17, 2008 259

restricted to antidepressants, to industry-spon-sored trials registered with the FDA, and to issues of efficacy (as opposed to “real-world” effective-ness33). This study did not account for other factors that may distort the apparent risk–benefit ratio, such as selective publication of safety issues, as has been reported with rofecoxib (Vioxx, Merck)34 and with the use of selective serotonin-reuptake inhibitors for depression in children.3 Because we excluded articles covering multiple studies, we probably counted some studies as unpublished that were — technically — published. The practice of bundling negative and positive studies in a single article has been found to be associated with du-plicate or multiple publication,35 which may also influence the apparent risk–benefit ratio.

There can be many reasons why the results of a study are not published, and we do not know the reasons for nonpublication. Thus, we cannot determine whether the bias observed resulted from a failure to submit manuscripts on the part of authors and sponsors, decisions by journal editors and reviewers not to publish submitted manu-scripts, or both.

We wish to clarify that nonsignificance in a single trial does not necessarily indicate lack of efficacy. Each drug, when subjected to meta-analy-sis, was shown to be superior to placebo. On the other hand, the true magnitude of each drug’s superiority to placebo was less than a diligent lit-erature review would indicate.

We do not mean to imply that the primary methods agreed on between sponsors and the FDA are necessarily preferable to alternative methods. Nevertheless, when multiple analyses are con-ducted, the principle of prespecification controls

the rate of false positive findings (type I error), and it prevents HARKing,36 or hypothesizing af-ter the results are known.

It might be argued that some trials did not merit publication because of methodologic flaws, including problems beyond the control of the in-vestigator. However, since the protocols were writ-ten according to international guidelines for ef-ficacy studies37 and were carried out by companies with ample financial and human resources, to be fair to the people who put themselves at risk to participate, a cogent public reason should be given for failure to publish.

Selective reporting deprives researchers of the accurate data they need to estimate effect size re-alistically. Inflated effect sizes lead to underesti-mates of the sample size required to achieve sta-tistical significance. Underpowered studies — and selectively reported studies in general — waste resources and the contributions of investigators and study participants, and they hinder the ad-vancement of medical knowledge. By altering the apparent risk–benefit ratio of drugs, selective pub-lication can lead doctors to make inappropriate prescribing decisions that may not be in the best interest of their patients and, thus, the public health.

Dr. Turner reports having served as a medical reviewer for the Food and Drug Administration. No other potential conflict of interest relevant to this article was reported.

We thank Emily Kizer, Marcus Griffith, and Tammy Lewis for clerical assistance; David Wilson, Alex Sutton, Ohidul Siddiqui, and Benjamin Chan for statistical consultation; Linda Ganzini, Thomas B. Barrett, and Daniel Hilfet-Hilliker for their com-ments on an earlier version of this manuscript; Arifula Khan, Kelly Schwartz, and David Antonuccio for providing access to FDA reviews; Thomas B. Barrett, Norwan Moaleji and Samantha Ruimy for double data extraction and entry; and Andrew Hamil-ton for literature database searches.

References

Hagdrup N, Falshaw M, Gray RW, Carter Y. All members of primary care team are aware of importance of evidence based medicine. BMJ 1998;317:282.

Craig JC, Irwig LM, Stockler MR. Evi-dence-based medicine: useful tools for de-cision making. Med J Aust 2001;174:248-53.

Whittington CJ, Kendall T, Fonagy P, Cottrell D, Cotgrove A, Boddington E. Se-lective serotonin reuptake inhibitors in childhood depression: systematic review of published versus unpublished data. Lan-cet 2004;363:1341-5.

Kyzas PA, Loizou KT, Ioannidis JP. Se-lective reporting biases in cancer prog-nostic factor studies. J Natl Cancer Inst 2005;97:1043-55.

Dickersin K, Chan S, Chalmers TC, Sacks HS, Smith H Jr. Publication bias and

1.

2.

3.

4.

5.

clinical trials. Control Clin Trials 1987;8: 343-53.

Simes RJ. Confronting publication bias: a cohort design for meta-analysis. Stat Med 1987;6:11-29.

Stern JM, Simes RJ. Publication bias: evidence of delayed publication in a co-hort study of clinical research projects. BMJ 1997;315:640-5.

Chan AW, Hróbjartsson A, Haahr MT, Gøtzsche PC, Altman DG. Empirical evi-dence for selective reporting of outcomes in randomized trials: comparison of pro-tocols to published articles. JAMA 2004; 291:2457-65.

Ioannidis JP. Effect of the statistical significance of results on the time to com-pletion and publication of randomized efficacy trials. JAMA 1998;279:281-6.

Chan AW, Krleza-Jerić K, Schmid I,

6.

7.

8.

9.

10.

Altman DG. Outcome reporting bias in randomized trials funded by the Canadi-an Institutes of Health Research. CMAJ 2004;171:735-40.

Chan AW, Altman DG. Identifying outcome reporting bias in randomised trials on PubMed: review of publications and survey of authors. BMJ 2005;330:753.

Lau J, Ioannidis JP, Terrin N, Schmid CH, Olkin I. The case of the misleading funnel plot. BMJ 2006;333:597-600.

Hayashino Y, Noguchi Y, Fukui T. Sys-tematic evaluation and comparison of sta-tistical tests for publication bias. J Epide-miol 2005;15:235-43.

Pham B, Platt R, McAuley L, Klassen TP, Moher D. Is there a “best” way to de-tect and minimize publication bias? An empirical evaluation. Eval Health Prof 2001;24:109-25.

11.

12.

13.

14.

The New England Journal of Medicine Downloaded from nejm.org at THE NATIONAL ACADEMIES on July 12, 2012. For personal use only. No other uses without permission.

Copyright © 2008 Massachusetts Medical Society. All rights reserved. 51

Page 102: Sharing Clinical Research Data:  an IOM Workshop

n engl j med 358;3 www.nejm.org january 17, 2008260

Selective Publication of Antidepressant Trials

Sterne JA, Gavaghan D, Egger M. Pub-lication and related bias in meta-analysis: power of statistical tests and prevalence in the literature. J Clin Epidemiol 2000; 53:1119-29.

Ioannidis JP, Trikalinos TA. The ap-propriateness of asymmetry tests for pub-lication bias in meta-analyses: a large sur-vey. CMAJ 2007;176:1091-6.

Turner EHA. A taxpayer-funded clini-cal trials registry and results database. PLoS Med 2004;1(3):e60.

Center for Drug Evaluation and Re-search. Manual of policies and procedures: clinical review template. Rockville, MD: Food and Drug Administration, 2004. (Accessed December 20, 2007, at http://www.fda.gov/cder/mapp/6010.3.pdf.)

Committee on Government Reform, U.S. House of Representatives, 109th Congress, 1st Session. A citizen’s guide on using the Freedom of Information Act and the Privacy Act of 1974 to request gov-ernment records. Report no. 109-226. Washington, DC: Government Printing Office, 2005. (Also available at: http://www.fas.org/sgp/foia/citizen.pdf.)

Center for Drug Evaluation and Re-search. Drugs@FDA: FDA approved drug products. Rockville, MD: Food and Drug Administration. (Accessed December 20, 2007, at http://www.accessdata.fda.gov/scripts/cder/drugsatfda.)

International Conference on Har-monisation (ICH), European Medicines

15.

16.

17.

18.

19.

20.

21.

Agency (EMEA). Topic E9: statistical prin-ciples for clinical trials. Rockville, MD: Food and Drug Administration. (Accessed December 20, 2007, at http://www.fda.gov/cder/guidance/iche3.pdf.)

Temple R, Ellenberg SS. Placebo-con-trolled trials and active-control trials in the evaluation of new treatments. Part 1: ethical and scientific issues. Ann Intern Med 2000;133:455-63.

Hedges LV. Estimation of effect size from a series of independent experiments. Psychol Bull 1982;92:490-9.

Rosenthal R. Meta-analytic procedures for social research. Newbury Park, CA: Sage, 1991.

Whitley E, Ball J. Statistics review 5: comparison of means. Crit Care 2002;6: 424-8.

Hedges LV, Olkin I. Statistical meth-ods for meta-analysis. New York: Academic Press, 1985.

DerSimonian R, Laird N. Meta-analy-sis in clinical trials. Control Clin Trials 1986;7:177-88.

Stata statistical software, release 9. College Station, TX: StataCorp, 2005.

Cohen J. Statistical power analysis for the behavioral sciences. 2nd ed. New York: Lawrence Erlbaum Associates, 1988.

Nissen SE, Wolski K. Effect of rosigli-tazone on the risk of myocardial infarc-tion and death from cardiovascular causes. N Engl J Med 2007;356:2457-71. [Erratum, N Engl J Med 2007;357:100.]

22.

23.

24.

25.

26.

27.

28.

29.

30.

Nissen SE, Wolski K, Topol EJ. Effect of muraglitazar on death and major ad-verse cardiovascular events in patients with type 2 diabetes mellitus. JAMA 2005;294: 2581-6.

Sackner-Bernstein JD, Kowalski M, Fox M, Aaronson K. Short-term risk of death after treatment with nesiritide for decompensated heart failure: a pooled analysis of randomized controlled trials. JAMA 2005;293:1900-5.

Revicki DA, Frank L. Pharmacoeco-nomic evaluation in the real world: effec-tiveness versus efficacy studies. Pharma-coeconomics 1999;15:423-34.

Topol EJ. Failing the public health — rofecoxib, Merck, and the FDA. N Engl J Med 2004;351:1707-9.

Melander H, Ahlqvist-Rastad J, Meijer G, Beermann B. Evidence b(i)ased medi-cine — selective reporting from studies sponsored by pharmaceutical industry: re-view of studies in new drug applications. BMJ 2003;326:1171-3.

Kerr NL. HARKing: hypothesizing af-ter the results are known. Pers Soc Psychol Rev 1998;2:196-217.

International Conference on Har-monisation — Efficacy. Rockville, MD: Food and Drug Administration. (Accessed December 20, 2007, at http://www.fda.gov/cder/guidance/#ICH_efficacy.)Copyright © 2008 Massachusetts Medical Society.

31.

32.

33.

34.

35.

36.

37.

view current job postings at the nejm careercenter

Visit our online CareerCenter for physicians at www.nejmjobs.org to see the expanded features and services available. Physicians can conduct a quick search of the public database by specialty and view hundreds

of current openings that are updated daily online at the CareerCenter.

The New England Journal of Medicine Downloaded from nejm.org at THE NATIONAL ACADEMIES on July 12, 2012. For personal use only. No other uses without permission.

Copyright © 2008 Massachusetts Medical Society. All rights reserved. 52

Page 103: Sharing Clinical Research Data:  an IOM Workshop

ClInICAl PhArMACology & TherAPeUTICS | VOLUME 87 NUMBER 5 | MAY 2010 539

perspectives

ing programs in which scientists enter-ing as physicists and clinicians/biologists, respectively, become network biologists and systems biologists.

Looking forward, there are several requirements to build a functional con-tributor network of biologists/clinicians evolving probabilistic causal models of disease. More robust coherent global genomic data sets need to be gener-ated—probably hundreds or thousands, as opposed to the current number of less than a hundred in the entire world today. Standards and annotations will need to be established and used through a col-lective process that corrects itself in ways that are similar to the semiconductor industry data-sharing efforts.5 Methods for rewarding scientists for sharing their data and also ways to recognize and cite models will be essential. Privacy and intellectual-property issues will need to be addressed in an iterative process.

If these challenges were to be tackled by a government or a single industrial or academic champion, they would prob-ably fail. If, instead, scientists, patient advocates, and industrial and government representatives can align and ensure that these tasks are attacked by a distributed network of scientists, software developers, modelers, and privacy and intellectual-property experts in a contributor network similar to that which drives other social network–driven efforts such as Twitter, there is a chance that we can build the probabilistic causal models so needed to develop biomarkers and new therapies more effectively.

CONFLICT OF INTERESTThe author declared no conflict of interest.

© 2010 ASCPT

1. Mathieu, R.G. IT systems perspective: top-down approach to computing. IEEE Computer 35, 138–139 (2002).

2. Schadt, E.E., Friend, S.H. & Shaywitz, D.A. A network view of disease and compound screening. Nat. Rev. Drug Discov. 8, 286–295 (2009).

3. Waters, K.M., Singhal, M., Webb-Robertson, B.J.M., Stephan, E.G., & Gephart, J.M. Breaking the high-throughput bottleneck: new tools help biologists integrate complex datasets. Sci. Comput. 23, 22–26 (2006).

4. Winning through cooperation: An interview with William Spencer. Technol. Rev. 100, 22–27 <http://www.technologyreview.com/computing/11501/> (1997).

The Biomarkers Consortium: Practice and Pitfalls of Open-Source Precompetitive CollaborationJA Wagner1, M Prince2, EC Wright3, MM Ennis4, J Kochan5, DJR Nunez6, B Schneider7, M-D Wang2, Y Chen1, S Ghosh8, BJ Musser1 and MT Vassileva9

Precompetitive collaboration is a growing driver for innovation and increased productivity in biomedical science and drug development. The Biomarkers Consortium, a public–private platform for precompetitive collaboration specific to biomarkers, demonstrated that adiponectin has potential utility as a predictor of metabolic responses to peroxisome proliferator–activated receptor (PPAr) agonists in individuals with type 2 diabetes. Despite the challenges overcome by this project, the most important lesson learned is that cross-company precompetitive collaboration is a feasible robust approach to biomarker qualification.

Biomedical research and drug devel-opment remain uncertain, expensive, time-consuming, and inefficient enter-prises.1 According to some estimates, it costs more than $1.2 billion to develop a new drug.2 In fact, over the past dec-ade drug development costs have sky-rocketed despite falling productivity evidenced by increasing drug devel-opment timelines and decreasing new molecular entity approvals. Increased precompetitive collaboration is a poten-tial driver for innovation and increased productivity.3 Precompetitive collabo-ration, defined as competitors sharing early stages of research that benefit all,3 is a potential driver for innovation and increased productivity.4 Open-source software research has grown from a cot-

tage industry into a business model, with products such as Linux becoming main-stream alternatives to industry leaders. Biology has evolved into an informa-tion-rich science, and the analogy with approaches to software could provide a potential answer to productivity issues for biomedical research and drug devel-opment. Recent advances, including the implementation of network models and large-scale consortia, highlight the increasing role that precompetitive col-laboration is assuming within biomedi-cal research and drug development. For example, Sage is an open-access net-work model platform that was recently launched with a vision to create open-access, integrative bionetworks, evolved from contributor scientists, to accelerate

1Merck, Rahway, New Jersey, USA; 2Eli Lilly and Company, Lilly Corporate Center, Indianapolis, Indiana, USA; 3National Institute of Diabetes and Digestive and Kidney Diseases/National Institutes of Health, Bethesda, Maryland, USA; 4Quintiles, Inc., Austin, Texas, USA; 5Hoffmann–La Roche, Nutley, New Jersey, USA; 6Metabolic Pathways Center of Excellence for Drug Discovery, GlaxoSmithKline, Research Triangle Park, North Carolina, USA; 7Center for Biologics Evaluation and Research, US Food and Drug Administration, Rockville, Maryland, USA ; 8 Biomedical Biotechnology Research Institute, North Carolina Central University, Durham, North Carolina, USA; 9Foundation for the National Institutes of Health, Bethesda, Maryland, USA. Correspondence: M Vassileva ([email protected])

doi:10.1038/clpt.2009.227

53

Page 104: Sharing Clinical Research Data:  an IOM Workshop

540 VOLUME 87 NUMBER 5 | MAY 2010 | www.nature.com/cpt

perspectives

oxisome proliferator–activated receptor (PPAR) agonist treatment, which corre-lated with decreases in glucose, glycated hemoglobin (HbA1c), hematocrit, and triglycerides, and increases in blood urea nitrogen, creatinine, and high-density lipoprotein cholesterol concentrations. Early (6–8 weeks) increases in adiponec-tin concentrations after treatment with PPAR agonists negatively correlated with subsequent changes in HbA1c. Logistic regression demonstrated that an increase in adiponectin was correlated with a decrease in HbA1c. These analyses con-firmed previous relationships between adiponectin levels and metabolic param-eters, and supported the potential utility of adiponectin across the spectrum of glucose tolerance. Despite multiple chal-lenges, this data-sharing project was com-pleted and answered many questions that would otherwise be impossible to resolve using the data sets of each individual company alone.

The adiponectin project represents the first completed project of the Biomark-ers Consortium. There are now six other ongoing projects within the consortium: two in the area of cancer, two more in metabolic disorders, and two in neuro-science. A few new projects are about to be launched in these three therapeu-

research on human disease.5 Another important large-scale precompetitive collaboration is the Biomarkers Consor-tium, a public–private partnership man-aged by the Foundation for the National Institutes of Health (FNIH). The goals of the consortium are to promote the development, qualification, and regula-tory acceptance of biomarkers; to con-duct research in precompetitive areas by teams that share vested interests in suc-cess; and to speed the development of medicines and therapies for detection, prevention, diagnosis, and treatment of disease through making consortium project results broadly available to the public so as to improve patient care.6

Development of the most useful biomarkers is generally resource inten-sive; it took years to achieve the consen-sus status now enjoyed in research and medical practice for biomarkers such as HbA1C.7 One of the most difficult tasks facing biomarker assessment and evaluation has always been harmoniz-ing the approaches of various stake-holders from government, industry, nonprofit organizations, providers, and academic institutions. There was a criti-cal need for a coordinated cross-sector partnership effort when the Biomarkers Consortium was officially launched in October 2006.

In April 2007, three of the four ther-apeutic-area Biomarkers Consortium Steering Committees—Cancer, Neu-roscience, and Metabolic Disorders—became operational. One of the first projects proposed to the consortium in the field of metabolic disorders was to evaluate the utility of adiponectin as a marker of glycemic efficacy through pooling existing data from clinical tri-als conducted by multiple sponsors. As shown in Figure 1, this project was completed about two years later as the results became available after having been approved for presentation at the Annual Scientific Sessions of the Ameri-can Diabetes Association meeting8 and for subsequent publication.9 In short, this project analyzed blinded data on 2,688 type 2 diabetes patients from ran-domized clinical trials conducted by four pharmaceutical companies. Adiponec-tin concentrations increased after per-

tic areas, as well as in Immunity and Inflammation. At the same time, similar public–private partnerships in the area of biomarkers, such as the Innovative Medicines Initiative and the Critical Path Institute, are starting to develop projects aimed at addressing bottlenecks in drug development.10 Therefore, it becomes important to summarize the lessons learned from the first completed project of the Biomarkers Consortium, espe-cially because it managed to facilitate the precompetitive sharing of preexisting data from four pharmaceutical compa-nies. A summary of the lessons learned is displayed in Table 1.

For the early days of the consortium, the trust among the decision makers who were willing to try this public-private partnership approach was invaluable. Robust advice from the US Food and Drug Administration (FDA) regard-ing the biomarker qualification process remains a crucial incentive for sustaining the interest and engagement of pharma-ceutical companies in biomedical public–private partnerships.

Before the consortium’s launch, the founding members from Pharmaceu-tical Research and Manufacturers of America, the FDA, the NIH, and the FNIH had multiple discussions not only

April June Aug. Oct. Dec. Feb. April June Aug. Oct. Dec. Feb.

Concept approval by MDSC

Phase I and II dataanalysis

MDSC review

Project concept submitted to BC

Clinical trials with relevant data identified by four companies

Project plan developed by team

ECreview

Project launch

Phase IIIdataanalysis

Data-sharing/confidentialityagreements developed and executed

Analysis of all available data

2007 2008

Manuscript andpresentationpreparation

2009

Projectteamformed

Figure 1 The development and execution timeline of the adiponectin project. It was submitted to the Biomarkers Consortium in the spring of 2007; the data analysis was completed by January 2009; the results were published in the summer of 2009. BC, Biomarkers Consortium; EC, Executive Committe; MDSC, Metabolic Disorders Steering Committee.

54

Page 105: Sharing Clinical Research Data:  an IOM Workshop

ClInICAl PhArMACology & TherAPeUTICS | VOLUME 87 NUMBER 5 | MAY 2010 541

perspectives

writing the project plan, reporting the results in written form, and presenting on the team’s behalf to outside audiences is also essential for operational success.

Because the adiponectin project was going to involve sharing of preexisting data, which required a complex data-sharing plan and agreements to be devel-oped and executed, it became clear that having each company designate the same official legal liaison for all interactions with the consortium about data-sharing issues, other contracts, and additional

In a collaborative environment, iden-tifying a principal investigator (a strong project team leader) to spearhead the effort is one of the first steps toward a successful outcome for the project. Desig-nating a scientific program manager from the entity that acts as a neutral convener to assist the team through the develop-ment, internal approval, and execution of the project; to make sure that deadlines are met; and to keep the work of the team cohesive by contributing project manage-ment skills and assisting the chair with

around the policies that should govern this novel partnership (such as intellec-tual property, confidentiality, and conflict of interest) but also around which spe-cific questions in the therapeutic areas of focus might be of interest to all stake-holders in the consortium. The impor-tance of pre-project consensus building can never be overstated. It is the best way to identify what questions might benefit most by a collaborative approach while at the same time satisfying a significant unmet medical need.

Table 1 Lessons learned in development of the Biomarkers Consortium

Issue Lesson Mitigation

Focus, organization, and pace

Although ultimately successful, the overall project was lengthy Robust project management with accountable leaders should be instituted in all projectsA clear and concise project plan with precise objectives is mandatorySetting and adhering to deadlines is important for achieving timely resultsSimultaneous work streams, where possible, will accelerate the projectCreating a sense of urgency drives success

Optimal collaboration A lack of collaboration tools hampered the project The Biomarkers Consortium created a collaboration Web portalRegular meetings and face-to-face meetings are important to uniformly institute optimal collaboration

Data-sharing principles and standards

A uniform data-sharing plan that was legally appropriate for all parties was difficult to negotiateStandard definitions across clinic protocols were not always obvious and clearly importantIt is very time-consuming to compile, clean, and blind the data, because every company performs clinical trials using different standard operating proceduresLimited institutional memory often remains after a given trial is completed at individual companiesBlinding the aggregated data so that no representative of the project team could identify individual sponsor data was possible but difficult

A single accountable legal liaison is crucialAdequate time and resources must be budgeted to data-sharing principles and standard definitionsMuch effort had to be put into blinding the data to ensure that no company could identify which data sets contributed to any given resultIn accordance with the Biomarkers Consortium’s general intellectual property and data-sharing principles, preexisting data from contributing companies were aggregated and analyzed by independent third parties, and the results of that process were shared with project team membersThe template for the Biomarkers Consortium data-sharing plan and confidentiality is now available

Limitations of existing data

The retrospective data set lacked time points earlier than 6 weeks of dosing, which limited the ability to make conclusions related to the prognostic value of the biomarkerBlinded aggregated data are inherently limited, including in this case difficulties with specifying dose responseBiomarker assays differed among company databases

Acknowledge limitationsCarry out prospective follow-up when necessary

Statistical analyses The opportunity to utilize the expertise of two statisticians at the same time for quality control purposes was very important for the credibility of the resultsThat the baseline analysis confirmed earlier findings, therefore confirming the utility of the compiled data set, was also criticalThe analysis plan evolved because the team could not fully understand all the questions possible to ask without knowing what data would be available after the compilation

For data-sharing projects, it is now clear that having a defined budget, including funds specifically designated for the statistical analysis in the project to be performed by an independent third party, is essentialThe statistical plan had to be revised a few times, and some interesting findings that were not expected arose at the end of the analysis

Biomarker qualification There was a lack of agreement on biomarker qualification at the conclusion of the project

Involvement of an FDA representative is importantPotential use of the FDA qualification pathway should be considered

Data dissemination Is piecemeal or full-story distribution more appropriate?Is a broad or narrow therapeutic area focus more appropriate?

Address the data dissemination issues proactively with the project team

55

Page 106: Sharing Clinical Research Data:  an IOM Workshop

542 VOLUME 87 NUMBER 5 | MAY 2010 | www.nature.com/cpt

perspectives

oration will continue to drive innovative open-source approaches to biomedical research and development.

ACKNOWLEDGMENTSThe helpful input of Myrlene Staten and the Biomarkers Consortium staff is gratefully acknowledged.

The findings and conclusions in this article have not been formally disseminated by the US Food and Drug Administration and should not be construed to represent any agency determination or policy.

CONFLICT OF INTERESTJ.A.W., Y.C., and B.J.M. are employees of Merck and may own stock and/or hold stock options in the company. M.P. and M.-D.W. are employees of Eli Lilly and may own stock and/or hold stock options in the company. J.K. is an employee of Hoffmann–La Roche. D.J.R.N. and S.G. are employees of GlaxoSmithKline and may own stock and/or hold stock options in the company. E.C.W., M.M.E., B.S., and M.T.V. declared no conflict of interest.

© 2010 ASCPT

1. Kola, I. The state of innovation in drug development. Clin. Pharmacol Ther. 83, 227–230 (2008).

2. Kaitin, K.I. Obstacles and opportunities in new drug development. Clin. Pharmacol Ther. 83, 210–212 (2008).

3. Weber, S. The Success of Open Source (Harvard University Press, Cambridge, MA, 2004).

4. Melese, T., Lin, S.M., Chang, J.L. & Cohen, N.H. Open innovation networks between academia and industry: an imperative for breakthrough therapies. Nat. Med. 15, 502–507 (2009).

5. Friend, S. & Schadt, E. Something wiki this way comes [interview by Bryn Nelson]. Nature 458, 13 (2009).

6. Altar, C.A. The Biomarkers Consortium: on the critical path of drug discovery. Clin. Pharmacol Ther. 83, 361–364 (2008).

7. Wagner, J.A., Williams, S.A. & Webster, C.J. Biomarkers and surrogate end points for fit-for-purpose development and regulatory evaluation of new drugs. Clin. Pharmacol Ther. 81, 104–107 (2007).

8. Wagner, J.A. et al. Utility of adiponectin as a biomarker predictive of glycemic efficacy is demonstrated by collaborative pooling of data from clinical trials conducted by multiple sponsors. Clin. Pharmacol Ther. 86, 619–625 (2009).

9. American Diabetes Association. Abstracts of the 69th Scientific Session, New Orleans, LA, 5–9 June 2009

10. Hughes, B. Novel consortium to address shortfall in innovative medicines for psychiatric disorders. Nat. Rev. Drug Discovery 8, 523–524 (2009).

important conclusions of the project.Practical aspects of collaboration are

also important. Utilizing collaborative resources, such as a website where team members can share and work on docu-ments together as well as receive feedback from other consortium members, can speed the development and execution phase of any project through facilitat-ing the collaboration of teams and mini-mizing e-mail exchanges. Planning for at least two face-to-face meetings—one at the inception of the project and one before the dissemination of the results—could also facilitate team collaboration and ensure reaching consensus in a more efficient manner. Finding a designated time every month that works for all team members and ensures their availability for recurring teleconferences, so they can contribute to the project develop-ment and discussion process throughout the year, is an essential part of maintain-ing strong team relations.

Putting the execution of processes in parallel always speeds up the timeline for the project. Early in the planning phase the team members should decide whether they would like to go for a wider dissemi-nation of the results or instead would pre-fer to distribute the results of their project only within the specific therapeutic area without advertising the broader impli-cations of the project or the findings. Preparation of the results for dissemina-tion requires significant effort and time on behalf of the project team leader, the scientific program manager, the statisti-cians, and the entire project team.

The most important lesson learned from the first completed project of the Biomarkers Consortium is that cross-company precompetitive collaboration is a feasible and powerful approach to biomarker qualification. This experience will enable the consortium and other biomedical research collaborations to approach future data-sharing exercises more effectively and efficiently. The wid-ening spectrum of precompetitive collab-

legal matters, could speed up the internal approval process, ensuring institutional memory and continuing understanding. The lessons learned during the process of developing the legal agreements for the adiponectin project became useful for the creation of templates for similar data-sharing projects developed later by the consortium.

For data-sharing projects, it is now clear that having a defined budget, including funds specifically designated for the statistical analysis to be per-formed by an independent third party, is essential. The opportunity to utilize the expertise of two statisticians at the same time for quality control purposes was very important for the credibility of the results from the adiponectin project. It was also critical that the baseline analy-sis reproduced earlier findings, therefore confirming the utility of the compiled data set.

Data sharing was a major focus of this project. Key challenges associated with the data-sharing process and the statisti-cal analyses are summarized in Table 1. The data specification table developed by the project team for the project was very useful but had its limitations, so the two statisticians needed to do considerable work in following up with each individ-ual company to clarify the data before being able to compile it. The analysis plan evolved because the team could not fully understand all the possible ques-tions to ask without knowing what data would be available after the compila-tion. The plan had to be revised several times. Some interesting but unexpected findings came at the end of the analysis, although they were not necessarily an important part of the original questions asked. For example, the potential utility of adiponectin across the spectrum of glucose tolerance, which was demon-strated through the phase III analysis, was not a finding that addressed a ques-tion that was originally in the analytical plan, but it is nevertheless among the

56

Page 107: Sharing Clinical Research Data:  an IOM Workshop

Draft as of September 25, 2012

Clinical Research Data Sharing Projects

This is an initial list of data sharing projects compiled by IOM staff. It is not an exhaustive list,

and inclusion does not denote endorsement. The list includes projects that share different types

of data and health information (i.e., genetic information, observational data, etc.) as thepublic

workshop would like to consider all data sharing initiatives that have best practices/lessons

learned to be applied to initiatives focused on sharing data from preplanned interventional

studies of human subjects. As indicated by the source line for each project, descriptions are

based largely on online resources or individual presentations.

Alzheimer’s Disease Neuroimaging Initiative (ADNI)

The Alzheimer's Disease Neuroimaging Initiative is designed to validate the use of biomarkers including blood tests, tests of cerebrospinal fluid, and MRI/ PET imaging for Alzheimer's disease (AD) clinical trials and diagnosis. An essential feature of the ADNI is that the clinical, neuropsychological, imaging, and biological data and biological samples will be made available to all qualified scientific investigators as a public domain research resource. Among many goals, ADNI is aiming to define the rate of progress of mild cognitive impairment (MCI) and AD, to develop improved methods for clinical trials, and to provide a large database which will improve design of clinical treatment trials. It is anticipated that ADNI will continue in the tradition of providing information and methods that ultimately lead to effective treatment and prevention of AD. With better knowledge of the earliest stages of the disease, researchers may be able to test potential therapies at earlier stages, when they may have the greatest promise for slowing down progression of this devastating disease. The ADNI study is now in its third phase (ADNI, ADNI GO and ADNI 2). ADNI 2 is studying the rate of change of cognition, function, brain structure, function and biomarkers in 200 elderly controls, 400 subjects with mild cognitive impairment, and 200 with AD. ADNI continues an unprecedented policy of open access to ADNI clinical and imaging data through this website and a parallel website at the Laboratory of Neuroimaging. All data is made public, without any embargo. Source: http://adni-info.org

Analgesic Clinical Trials Innovation, Opportunities and Networks (ACTION)

Initiative

The ACTION Initiative aims to streamline the discovery and development process for new analgesic drug products for the benefit of the public health. This multi-year, multi-phased Initiative will work to:

57

Page 108: Sharing Clinical Research Data:  an IOM Workshop

Draft as of September 25, 2012

- Establish a scientific and administrative infrastructure to support a series of projects; -Establish relationships with key expert stakeholders, industry, professional organizations, academia and government agencies; -Coordinate scientific workshops with key experts in the field of anesthesia and analgesia; -Conduct in-depth and wide-ranging data analyses of analgesic clinical trial data to determine the

effects of specific research designs and analysis methods.

Source: http://www.fda.gov/AboutFDA/PartnershipsCollaborations/PublicPrivatePartnershipProgram /ucm231130.htm

Arch2POCM [Academics, regulators, citizens, health industry, through to Proof Of Clinical Mechanism (Phase IIa)]

Arch2POCM is a new public private partnership (PPP) whose mission is to help invigorate and

accelerate the pharmaceutical industry’s development of new and effective medicines by

conducting discovery through Phase II clinical trials on completely novel targets.

Arch2POCM will focus on novel targets that are judged as too risky to be pursued in industry, but that have significant scientific potential (e.g., diseases of cancer, immunology, autism and schizophrenia). Safe drug-like compounds that interact with the selected targets will be developed all the way through Phase IIb clinical trials in order to determine if modulation of Arch2POCM’s targets beneficially impacts disease in patients. Arch2POCM will produce information that empowers drug development. To do this,

Arch2POCM will operate without filing any patents and will share all of its data and drug-like

compounds. The result will be a common stream of knowledge about the roles a selected target

plays in human biology and disease.

Source: http://sagebase.org/WP/arch

Biogrid Australia

BioGrid Australia Limited is a secure research platform and infrastructure that provides access to real-time clinical, imaging and biospecimen data across jurisdictions, institutions and diseases. The web-based platform provides ethical access while protecting both privacy and intellectual property. BioGrid Australia and the Australian Cancer Grid (ACG) allows life science researchers to link data across all states, regardless of their existing linkage and research platforms.

58

Page 109: Sharing Clinical Research Data:  an IOM Workshop

Draft as of September 25, 2012

The BioGrid Australia/ACG is a virtual repository and will link to all participating teaching hospitals and cancer research centres in Australia as well as the integrated cancer centres in Victoria. The platform maximizes life science research and provides researchers with access to multiple data sets, to data on clinical outcomes, to quality and audit data as well as genomic data, images and analytical tools. Source: http://www.biogrid.org.au/wps/portal

Biomarkers Consortium Project on Adiponectin

The consortium is helping to create a new era of personalized medicine, with more highly predictive markers that have an impact during a patient’s illness or lifespan. Their goal is to combine the forces of the public and private sectors to accelerate the development of biomarker-based technologies, medicines, and therapies for the prevention, early detection, diagnosis, and treatment of disease. The use of adiponectin, a hormone derived from fat cells, which is abundant in plasma and easy to measure through commercially available kits, was confirmed as a robust biomarker predictive of glycemic efficacy in Type 2 diabetes and healthy subjects, after treatment with peroxisome proliferator-activated receptor-agonists (PPAR), but not after treatment with non-PPAR drugs such as metformin. The findings were the result of the first project to be completed by the Biomarkers Consortium, a public-private partnership managed by the Foundation for the National Institutes of Health. The project conducted a statistical analysis of pooled and blinded pre-existing data from Phase II clinical trials contributed by four pharmaceutical companies and analyzed under the direction of a diverse team of scientists from industry, the National Institutes of Health (NIH), U.S. Food & Drug Administration (FDA), and academic research institutions.

Source: http://www.biomarkersconsortium.org/press_release_adiponectin_predictive_biomarker.php

Biosense (CDC)

BioSense is a program of the Centers for Disease Control and Prevention (CDC) that tracks health problems as they evolve and provides public health officials with the data, information and tools they need to better prepare for and coordinate responses to safeguard and improve the health of the American people. Mandated in the Public Health Security and Bioterrorism Preparedness and Response Act of 2002, the CDC BioSense Program was launched in 2003 to establish an integrated national public health surveillance system for early detection and rapid assessment of potential bioterrorism-related illness.

BioSense 2.0 protects the health of the American people by providing timely insight into the health of communities, regions, and the nation by offering a variety of features to improve data collection, standardization, storage, analysis, and collaboration. Using the latest technology, BioSense 2.0 integrates current health data shared by health departments from a variety of sources to provide insight on the health of communities and the country. By getting more

59

Page 110: Sharing Clinical Research Data:  an IOM Workshop

Draft as of September 25, 2012

information faster, local, state, and federal public health partners can detect and respond to more outbreaks and health events more quickly. BioSense 2.0 is community controlled and user driven.

Source: http://www.cdc.gov/biosense/

caBIG (Cancer Biomedical Informatics Grid)

The mission of caBIG is to develop a collaborative information network that accelerates the discovery of new approaches for the detection, diagnosis, treatment, and prevention of cancer. caBIG is sponsored by the National Cancer Institute (NCI) and administered by the National Cancer Institute Center for Bioinformatics and Information Technology (NCI-CBIIT). Anyone can participate in caBIG and there is no cost to join. The community includes Cancer Centers, other NCI-supported research endeavors, and a variety of federal, academic, not-for profit and industry organizations. caBIG seeks to connect scientists and practitioners through a shareable and interoperable infrastructure; develop standard rules and a common language to more easily share information; and build or adapt tools for collecting, analyzing, integrating, and disseminating information associated with cancer research and care.

Since its start, caBIG has committed to the following principles:

• Federated: caBIG software and resources are widely distributed, interlinked, and available to everyone in the cancer research community, but institutions maintain local control over their own resources and data.

• Open-development: caBIG tools and infrastructure are being developed through an open, participatory process. caBIG leverages existing resources whenever possible, rather than building new tools in every case.

• Open-access: caBIG resources are freely obtainable by the cancer community to ensure broad data-sharing and collaboration.

• Open-source: The caBIG source code is available to view, alter, and redistribute.

The caBIG program facilitates the integration of diverse research and data across the "bench to bedside continuum" through an interoperable infrastructure and collaborative leadership. Appropriate data sharing is a key element of this collaboration. Benefits of data sharing include:

• The large volumes of research data created by the high throughput genomics and proteomics technologies can best be harvested by teams of individuals, rather than by single PI's.

• Collaboration across and within disciplines is required to leverage the broad knowledge and skill bases needed to realize the scientific and public health benefits of translational and personalized medicine.

• Data sharing raises the visibility of individual studies and data collections; it opens new avenues of data dissemination and validation, and points to opportunities for future collaboration.

• Grants from NIH exceeding $500,000 require a plan for data sharing.

60

Page 111: Sharing Clinical Research Data:  an IOM Workshop

Draft as of September 25, 2012

CAMD (Coalition Against Major Diseases) and C-Path Alzheimer’s Database

A new database of more than 4,000 Alzheimer’s disease patients who have participated in 11 industry-sponsored clinical trials was released in June 2010 by CAMD. This is the first database of combined clinical trials to be openly shared by pharmaceutical companies and made available to qualified researchers around the world. It is also the first effort of its kind to create a voluntary industry data standard that will help accelerate new treatment research on brain disease, as patients with other related brain diseases are expected to be added. The level of detail and scope of this database will enable researchers to more accurately predict the true course of Alzheimer’s, Parkinson’s, Huntington’s, and other neuro-degenerative diseases, thereby enabling the design of more efficient clinical trials. Patient identifiers will not be included in the database, thereby ensuring patient privacy.

Version 1.0 of an Alzheimer’s data standard, used to develop the C-Path CAMD groundbreaking Alzheimer’s Disease data repository, was developed through the CDISC process and posted to the CDISC website in October 2011. Since this first major milestone, CDISC and C-Path and such partners as the Bill and Melinda Gates Foundation, the Michael J. Fox Foundation, the PKD Foundation, Tufts University, Rochester University, the National Institutes of Health (NIH) and others have continued to make advancements in a number of therapeutic areas. Tuberculosis, Pain, Parkinson’s Disease, Polycystic Kidney Disease, Virology, Oncology, and Cardiovascular Disease are therapeutic areas for which standards are currently in development. As of June 2012, C-Path and the Clinical Data Interchange Standards Consortium (CDISC) announced the signing of a partnership agreement to establish the Coalition For Accelerating Standards and Therapies (CFAST) an initiative to accelerate clinical research and medical product development by creating and maintaining data standards, tools and methods for conducting research in therapeutic areas that are important to public health.

CAMD is a formal consortium of pharmaceutical companies, research foundations and patient advocacy/voluntary health associations, with advisors from government research and regulatory agencies including the U.S. Food and Drug Administration (FDA), the European Medicines Agency (EMA), the National Institute of Neurological Disorders and Stroke (NINDS), and the National Institute on Aging (NIA). CAMD is led and managed by the non-profit Critical Path Institute (C-Path), which is funded by a cooperative agreement with the FDA and a matching grant from Science Foundation Arizona. Source: http://www.c-path.org/News/CDISCTAStds%20PR-24June2012.pdf

CDC’s Chronic Fatigue Syndrome Wichita Clinical Study

The data sets were collected during a 2-day in-hospital clinical assessment study conducted from December 2002 to July 2003 in Wichita, KS, USA. The study enrolled 227 people and classified them into study groups. Assessment included clinical evaluation of each subject's medical and psychiatric status, sleep characteristics and cognitive functioning. Laboratory testing evaluated neuroendocrine status, autonomic nervous system function, systemic cytokine profiles,

61

Page 112: Sharing Clinical Research Data:  an IOM Workshop

Draft as of September 25, 2012

peripheral blood gene expression patterns and polymorphisms in genes involved in neurotransmission and immune regulation. CDC approved the sharing of the chronic fatigue syndrome Wichita Clinical dataset with 25 intramural and extramural investigators in the CFS Computational Challenge and with CAMDA (Critical Assessment of Microarray Data Analysis) in 2006 and 2007. Over the past 3 years, sharing the Wichita Clinical dataset and partnering with expert extramural scientists resulted in more than 20 publications and there are more manuscripts delving into this dataset in the works. CDC’s CFS Research Program has published about 15 papers based on the Wichita Clinical study in the past 3 years. In addition to yielding “more value for the money.” this data-sharing effort generated new perspectives on CFS and confirmed previous observations about neuroendocrine and immune dysfunction in CFS. Sources: http://www.cdc.gov/rdc/B1DataType/Dt132.htm and http://www.cfids.org/advocacy/testimony-vernon-oct2008.pdf

Cancer Commons

Modern molecular biology supports the hypothesis that cancer is actually hundreds or thousands of rare diseases, and that every patient’s tumor is, to some extent, unique. Although there is a rapidly growing arsenal of targeted cancer therapies that can be highly effective in specific subpopulations, especially when used in rational combinations to block complementary pathways, the pharmaceutical industry continues to rely on large-scale randomized clinical trials that test drugs individually in heterogeneous populations. Such trials can be an extremely inefficient strategy for searching the combinational treatment space, and capture only a small portion of the data needed to predict individual treatment responses. On the other hand, an estimated 70% of all cancer drugs are used off-label in cocktails based on each individual physician’s experience, as if the nation's 30,000 oncologists are engaged in a gigantic uncontrolled and unobserved experiment, involving hundreds of thousands of patients suffering from an undetermined number of diseases. These informal experiments could provide the basis for what amounts to a giant adaptive search for better treatments, if only the genomic and outcomes data could be captured and analyzed, and the findings integrated and disseminated. Toward this end, Cancer Commons was developed, a family of web-based open-science communities in which physicians, patients, and scientists collaborate on models of cancer subtypes to more accurately predict individual responses to therapy. The goals are to: 1) give each patient the best possible outcome by individualizing their treatment based on their tumor’s genomic subtype; 2) learn as much as possible from each patient’s response, and 3) rapidly disseminate what is learned. The key innovation is to run this adaptive search strategy in "realtime", continuously updating disease and treatment models so that the knowledge learned from one patient is disseminated in time to help the next. In this way, Cancer Commons is attempting a new patient-centric paradigm for translational medicine, in which every patient receives personalized therapy based upon the best available science, and researchers continuously test and refine their models of cancer biology and therapeutics based on the resulting clinical responses.

62

Page 113: Sharing Clinical Research Data:  an IOM Workshop

Draft as of September 25, 2012

How It Works

Model: Editorial boards comprised of leading physicians and scientists in each cancer curate molecular disease models (MDMs) that identify the most relevant tests, treatments and trials for each molecular subtype of that cancer, based on the best current knowledge.

Test and Treat: Patients and Physicians access the MDM through web based applications that transform its knowledge into personalized actionable information that inform testing and treatment decisions.

Report and Analyze: Clinicians and researchers report clinical observations and outcomes, and collaboratively analyze them to validate or refute the MDM’s knowledge and hypotheses.

Learn: The editorial boards review these results and update the MDMs in real time.

Source: http://cancercommons.org/wp-content/themes/cancer_commons/docs/cancer_commons_whitepaper.pdf and http://www.cancercommons.org/about/

ClinicalTrials.gov

ClinicalTrials.gov offers information for locating federally and privately supported clinical trials for a wide range of diseases and conditions. A clinical trial (also clinical research) is a research study in human volunteers to answer specific health questions. Interventional trials determine whether experimental treatments or new ways of using known therapies are safe and effective under controlled environments. Observational trials address health issues in large groups of people or populations in natural settings. ClinicalTrials.gov currently contains 128,589 trials sponsored by the National Institutes of Health, other federal agencies, and private industry. Studies listed in the database are conducted in all 50 States and in 180 countries. ClinicalTrials.gov receives over 50 million page views per month 65,000 visitors daily. NIH, through its National Library of Medicine (NLM), has developed this site in collaboration with the FDA as a result of the FDA Modernization Act, which was passed into law in November 1997.

What information is in ClinicalTrials.gov?

Summaries of Clinical Study Protocols that include the following information:

• summary of the purpose of the study • recruiting status • disease or condition and medical product under study • research study design • phase of the trial • criteria for participation

63

Page 114: Sharing Clinical Research Data:  an IOM Workshop

Draft as of September 25, 2012

• location of the trial and contact information

Summaries of Clinical Study Results for some studies that include the following information:

• description of study participants (e.g., number enrolled, demographic data) • overall outcomes of the study • summary of adverse events experienced by participants

Links to Online Health Resources that help place clinical studies in the context of current medical knowledge, including MedlinePlus and PubMed.

Source: http://www.nlm.nih.gov/pubs/factsheets/clintrial.html

Clinical Trial Comparator Arm Partnership (CTCAP)

This 'Pre-Competitive' Disease Biology program has grown out of a transformative partnership between two synergistic nonprofits, Sage Bionetworks and Genetic Alliance and many of the world’s largest healthcare organizations. The program already has unprecedented commitments from leading pharmaceutical and healthcare corporations to contribute existing data from non-commercially-sensitive placebo and comparator clinical trial arms. These pre-competitive results will be made available to academic as well as industry researchers to enable the development of computational models that more accurately define the molecular basis of disease and can better guide the development of new therapies, better biomarkers and improved standards of clinical care. The core activities of the project include:

1) Collecting, curating and hosting clinical trial datasets from the pharmaceutical industry at the

Sage Commons

2) Developing and hosting network models of disease

3) Establishing a framework for release into the public domain.

Source: http://sagebase.org/partners/CTCAP.php

DataSphere Project (CEO Roundtable on Cancer)

DataSphere is a project to share oncology clinical trial sets. The project will include comparator arm oncology datasets: protocols, case report forms, data descriptors. Datasets will be publicly available via a file-sharing website (i.e., a secure database so that qualified researchers can access data). The rationale for the project is to increase transparency and improve data standards; improve research; and honor patient volunteers. Sanofi will donate two recent Phase III oncology datasets by October 2012. Further dataset contributions expected from other members of the Life

64

Page 115: Sharing Clinical Research Data:  an IOM Workshop

Draft as of September 25, 2012

Sciences Consortium of the CEO Roundtable on Cancer, as well as other pharmaceutical companies, and academics, by the end of 2012. Ultimate goal is 30 datasets by the end of Q2, 2013. Source: Presentation by Charles Hugh-Jones, M.D., MRCP, Sanofi, on behalf of DataSphere project team to IOM staff (May 10, 2012)

The database of Genotypes and Phenotypes (dbGaP)

The database of Genotypes and Phenotypes (dbGaP) was developed to archive and distribute the results of studies that have investigated the interaction of genotype and phenotype. Such studies include genome-wide association studies, medical sequencing, molecular diagnostic assays, as well as association between genotype and non-clinical traits. dbGaP provides two levels of access - open and controlled - in order to allow broad release of non-sensitive data, while providing oversight and investigator accountability for sensitive data sets involving personal health information. Summaries of studies and the contents of measured variables as well as original study document text are generally available to the public, while access to individual-level data including phenotypic data tables and genotypes require varying levels of authorization. The data in dbGaP will be pre-competitive, and will not be protected by intellectual property patents. Investigators who agree to the terms of dbGaP data use may not restrict other investigators' use of primary dbGaP data by filing intellectual property patents on it. However, the use of primary data from dbGaP to develop commercial products and tests to meet public health needs is encouraged. Source: http://www.ncbi.nlm.nih.gov/gap

The “ePlacebo” Database Recruiting patients for clinical trials is increasingly difficult, not least because of the logistical and oftentimes ethical issues involved in randomizing patients to control or placebo arms. Creating a large-scale, pre-competitive, historical control (“ePlacebo”) database of patients from the placebo/control arms of multiple trials would provide a research utility with many potential uses, including improved trial design and better context for baseline adverse event levels. As opposed to the series of well-populated but disconnected databases for individual disease areas, a large-scale, integrated resource would be of value across diseases. A more general database would permit aggregation across disease areas, and would leave investigators the option to analyze data according to the needs of their particular research question. Pfizer has completed a feasibility assessment of pooling placebo data across multiple clinical trials. We created our internal “ePlacebo” test data mart with de-identified data from over 1,000

65

Page 116: Sharing Clinical Research Data:  an IOM Workshop

Draft as of September 25, 2012

trials comprising over 20,000 patients, and then created a segment of approximately 3500 patients from the Pain therapeutic area. We are currently performing internal proof of concept studies on this data in a test environment. Ideally, the ePlacebo database would contain pooled, de-identified data from several different sources and would serve as a research utility for all contributors. Building a large-scale placebo/controls database would require the interest and participation of several companies, a framework to ensure data is being used appropriately with appropriate informed consent, the technical infrastructure for hosting the data, the accompanying trial metadata to better characterize patients, the statistical and technical methods for pooling data, a steering committee to manage and validate the use of the data, and an outside convener that could serve as the “neutral party” among the participating organizations. Developing these agreements and infrastructure would produce a resource with broad applications that could help speed the delivery of new treatments to patients. Source: Personal communication with Michael Cantor, Pfizer Clinical Informatics and Innovation Senior Director

Genetic Association Information Network (GAIN)

GAIN supports a series of Genome-Wide Association Studies (GWAS) designed to identify specific points of DNA variation associated with the occurrence of a particular common disease. Investigators from existing case-control or trio (parent-offspring) studies were invited to submit samples and data on roughly 2,000 participants for assay of 300,000 to 500,000 single nucleotide polymorphisms designed to capture roughly 80 percent of the common variation in the human genome. Now that the GAIN initiative has officially concluded, the resulting data from the GAIN studies have been deposited into the database of Genotype and Phenotype (dbGaP) within the National Library of Medicine at the NIH for the broad use of the research community. Access is controlled by the GAIN Data Access Committee. Source: http://www.genome.gov/19518664

Informatics for Integrating Biology and the Bedside (i2b2)

i2b2 (Informatics for Integrating Biology and the Bedside) is an NIH-funded National Center for Biomedical Computing based at Partners HealthCare System. The i2b2 Center is developing a scalable informatics framework that will enable clinical researchers to use existing clinical data for discovery research and, when combined with IRB-approved genomic data, facilitate the design of targeted therapies for individual patients with diseases having genetic origins. This platform currently enjoys wide international adoption by the CTSA network, academic health centers, and industry. i2b2 is funded as a cooperative agreement with the National Institutes of Health.

66

Page 117: Sharing Clinical Research Data:  an IOM Workshop

Draft as of September 25, 2012

Innovative Medicines Initiative (IMI) European Medical Information Framework (EMIF)

The new IMI initiative (not yet formally announced) is called EMIF (European Medical Information Framework) and is aimed at sharing individual patient level data across a variety of diseases. The Innovative Medicines Initiative (IMI) is Europe's largest public-private partnership aiming to improve the drug development process by supporting a more efficient discovery and development of better and safer medicines for patients. With a €2 billion euro budget, IMI supports collaborative research projects and builds networks of industrial and academic experts in Europe in order to boost innovation in healthcare. Acting as a neutral third party in creating innovative partnerships, IMI aims to build a more collaborative ecosystem for pharmaceutical research and development. IMI plans to provide socio-economic benefits to European citizens, increase Europe's competitiveness globally and establish Europe as the most attractive place for pharmaceutical R&D. IMI supports research projects in the areas of safety and efficacy, knowledge management and education and training. Projects are selected through open Calls for proposals. The research consortia participating in IMI projects consist of large biopharmaceutical companies that are members of EFPIA, and a variety of other partners, such as: small- and medium-sized enterprises, patients' organizations, universities and other research organizations, hospitals, regulatory agencies, and any other industrial partners. IMI is a Joint Undertaking between the European Union and the European Federation of Pharmaceutical Industries and Associations (EFPIA).

Source: http://www.imi.europa.eu/content/home

Kaiser Permanente Research Program on Genes, Environment, and

Health (RPGEH) and UCSF Biobank

The RPGEH, a scientific research program at Kaiser Permanente in California, is one of the largest research projects in the United States to examine the genetic and environmental factors that influence common diseases such as heart disease, cancer, diabetes, high blood pressure, Alzheimer's disease, asthma and many others. The goal of the research program is to discover which genes and environmental factors—the air we breathe, the water we drink, as well as lifestyles and habits—are linked to specific diseases.

Based on the over six million-member Kaiser Permanente Medical Care Plan of Northern California (KPNC) and Southern California (KPSC), the completed resource will link together comprehensive electronic medical records, data on relevant behavioral and environmental factors, and biobank data (genetic information from saliva and blood) from 500,000 consenting health plan members.

67

Page 118: Sharing Clinical Research Data:  an IOM Workshop

Draft as of September 25, 2012

In addition to learning more about the genetic and environmental determinants of disease, Kaiser Permanente research scientists, working in collaboration with other scientists across the nation and around the world, hope to translate research findings into improvements in health and medical care. They also hope to develop a broader understanding of the ethical, legal, and social implications of using genetic information in health care.

Source: http://www.dor.kaiser.org/external/DORExternal/rpgeh/index.aspx

Mini-Sentinel

The objective of the Mini-Sentinel pilot project is to inform and facilitate development of the Sentinel System and to carry out mandates delineated in FDAAA. With Mini-Sentinel, the FDA seeks to support new activities intended to develop the scientific operations required to build an efficient, valid, and reliable Sentinel System. The Mini-Sentinel pilot funds development of a single Coordinating Center that: provides the FDA a "laboratory" for developing and evaluating scientific methods that might be used in a fully-operational Sentinel System; affords the FDA the opportunity to assess safety issues using existing electronic healthcare data systems; and allows the FDA to learn more about the barriers and challenges to building a viable and accurate system of safety surveillance for FDA-regulated medical products

Key features of Mini-Sentinel include:

1. Active Surveillance: Mini-Sentinel's foundational activities are intended to facilitate implementation of an active medical-product safety surveillance system for FDA-regulated medical products, both newly and previously approved.

2. Collaboration: Mini-Sentinel is a collaborative endeavor conducted by a large group of Data and Academic Partners, known collectively as Collaborating Institutions, that provide both access to healthcare data and ongoing scientific, technical, methodological, and organizational expertise necessary to meet the requirements of the pilot. Each Collaborating Institution has a Site Principal Investigator who leads participation by that organization.

3. Coordinating Center: The Mini-Sentinel Coordinating Center (MSCC) integrates the activities of the Collaborating Institutions. Within the MSCC, the Operations Center develops and maintains the scientific, technical, analytic, and administrative infrastructure required to facilitate distributed scientific assessments. The Planning Board provides a forum for active participation by representatives of the Collaborating Institutions.

4. Distributed Data Approach: Mini-Sentinel uses a distributed approach that enables healthcare data to remain in their local environments under the control of the participating Data Partners. This approach is enhanced through development of a common data model that enables creation of centralized analytic programs for distribution to Data Partners for local implementation.

68

Page 119: Sharing Clinical Research Data:  an IOM Workshop

Draft as of September 25, 2012

5. Data Sources: Initial data sources include administrative claims environments with pharmacy dispensing data. Data from outpatient and inpatient electronic health records (EHRs) and registries will be added subsequently.

6. Rapid Response to Queries: The Mini-Sentinel system is being designed to accommodate urgent FDA requests for information about medical product exposures and outcomes.

7. Communication of Results: Results that relate exposure of a medical product to a health outcome will be posted within 30 days of completion of standard checks for accuracy.

8. Methods Development: Mini-Sentinel supports development and testing of improved statistical and epidemiological approaches to active surveillance of regulated medical products.

9. Policy Development: Mini-Sentinel supports development of policies and procedures to promote productive engagement among Collaborating Institutions consistent with the custodial responsibilities of the Data Partners.

10. Impact of FDA Actions: Mini-Sentinel is designed to facilitate assessment of the impact of FDA regulatory actions on the use of regulated medical products.

11. Engagement with Related Efforts: Under the FDA's direction, Mini-Sentinel coordinates with other national initiatives to facilitate development of comparable standards and optimal methodologies, and to disseminate lessons learned from the work of Mini-Sentinel.

Source: http://mini-sentinel.org/about_us/

NEWMEDS

Novel Methods leading to New Medications in Depression and Schizophrenia (NEWMEDS) - is an international consortium of scientists that has launched one of the largest ever research academic-industry collaboration projects to find new methods for the development of drugs for schizophrenia and depression. NEWMEDS brings together top scientists from academic institutions with a wide range of expertise, and partners them with nearly all major biopharmaceutical companies. The project will focus on developing new animal models which use brain recording and behavioural tests to identify innovative and effective drugs for schizophrenia. NEWMEDS will develop standardised paradigms, acquisition and analysis techniques to apply brain imaging, especially fMRI and PET imaging to drug development. It will examine how new genetic findings (duplication and deletion or changes in genes) influence the response to various drugs and whether this information can be used to choose the right drug for the right patient. And finally, it will try and develop new approaches for shorter and more efficient trials of new medication – trials that may require fewer patients and give faster results.

Source: http://www.newmeds-europe.com

69

Page 120: Sharing Clinical Research Data:  an IOM Workshop

Draft as of September 25, 2012

One Mind Initiative

One Mind is a new-model non-profit organization that will take the lead role in the research,

funding, marketing, and public awareness of mental illness and brain injury, by bringing together

the governmental, corporate, scientific, and philanthropic communities in a concerted effort to

drastically reduce the social and economic effects of mental illness and brain injury within ten

years. One Mind for Research is working on several key programs and initiatives.

Knowledge Integration Network for Traumatic Brain Injury and Post Traumatic Stress is a new Public Private Partnership bringing best-in-class science, technology and expertise together to accelerate research and realize new diagnostics, treatments and cures for TBI and PTS disorders. Goals of the Knowledge Integration Network:

1. Increase the rate of large-scale collection of clinical and biomarker information in order:

• To help identify biological indicators of the causes and effects of diseases, or pathology.

• To investigate disease progression for earlier, more accurate diagnosis and treatment. 2. Create new systems and approaches that will enable:

• The sharing of data from disparate sources with academia, industry, non-profits and governments

• The removal of barriers to effective scientific and clinical research that will speed diagnosis and treatment on an unprecedented level.

3. Empower patients to take a more active role in their own care and contribute to the acceleration of research by direct engagement and expanding the use of advanced biosensors & diagnostics.

Brain Data Exchange Portal: One Mind in partnership with the International Neuroinformatics Coordinating Facility is creating a digital environment for multiple-source data sharing, with open analysis tools, and provenance tracking systems for working with complex data. They are developing this in concert with leading Health Information Technologies partners. Several open-source technology platforms and tools will serve as candidate components for the shared data portal ecosystem that will be built in two stages. Version 1.0 will be completed in 2012 and will serve as the prototype for a full-scale 2.0 model to be launched in 2013. This system will foster an open science approach among academia, industry, non-profits and governments, to remove the barriers to effective scientific and clinical research. Virtual Biorepository Program: A dynamic, on-line international registry of brain-related disorder biorepositories supported by the Neuroscience Information Framework (NIF), American Brain Coalition, and the NIH.

Source: http://1mind4research.org/programs

70

Page 121: Sharing Clinical Research Data:  an IOM Workshop

Draft as of September 25, 2012

Parkinson’s Progression Markers Initiative (PPMI)

The Parkinson’s Progression Markers Initiative is a landmark observational clinical study to comprehensively evaluate a cohort of recently diagnosed PD patients and healthy subjects using advanced imaging, biologic sampling and clinical and behavioral assessments to identify biomarkers of Parkinson’s disease progression. PPMI will be carried out over five years at 24 clinical sites in the United States, Europe, and Australia, and requires the participation of 400 Parkinson’s patients and 200 control participants. The mission of PPMI is to identify one or more biomarkers of Parkinson’s disease (PD) progression, a critical next step in the development of new and better treatments for PD. PPMI will establish a comprehensive, standardized, longitudinal PD data and biological sample repository that will be made available to the research community.

Specific aims to accomplish this objective include:

1. Develop a comprehensive and uniformly acquired clinical/imaging dataset with correlated biological samples that can be used in biomarker verification studies

2. Establish standardized protocols for acquisition, transfer and analysis of clinical and imaging data and biological samples that can be used by the research community

3. Investigate existing biomarkers and identify new clinical, imaging and biological markers to determine interval changes in these markers in PD patients compared with control participants.

Source: http://ppmi-info.org

PatientsLikeMe

PatientsLikeMe was co-founded in 2004 by three MIT engineers: brothers Benjamin and James Heywood and longtime friend Jeff Cole. Five years earlier, their brother and friend Stephen Heywood was diagnosed with ALS (Lou Gehrig’s disease) at the age of 29. The Heywood family soon began searching the world over for ideas that would extend and improve Stephen’s life. Inspired by Stephen’s experiences, the co-founders and team conceptualized and built a health data-sharing platform that they believe can transform the way patients manage their own conditions, change the way industry conducts research and improve patient care.

Today, PatientsLikeMe is a for-profit company, but not one with a “just for profit” mission. they follow four core values: putting patients first, promoting transparency (“no surprises”), fostering openness and creating “wow.” They’re guided by these values as they continually enhance their platform, where patients can share and learn from real-world, outcome-based health data. They’ve also centered their business around these values by aligning patient and industry interests through data-sharing partnerships. They work with trusted nonprofit, research and industry Partners who use this health data to improve products, services and care for patients. They envision a world where information exchange between patients, doctors, pharmaceutical companies, researchers and the healthcare industry can be free and open; where, in doing so,

71

Page 122: Sharing Clinical Research Data:  an IOM Workshop

Draft as of September 25, 2012

people do not have to fear discrimination, stigmatization or regulation; and where the free flow of information helps everyone. They envision a future where every patient benefits from the collective experience of all, and where the risk and reward of each possible choice is transparent and known.

http://www.patientslikeme.com/about

Sage Bionetworks

Sage Bionetworks is a new 501(c)(3) nonprofit biomedical research organization created to revolutionize how researchers approach the complexity of human biological information and the treatment of disease. Sage’s mission has five interdependent themes: (1) research on network models of disease; (2) pilot projects trialing disruptive models of cooperation; (3) rules and reward structures that promote data sharing; (4) building the computational platform for a digital Commons; and (5) activating public engagement and access. They are driven by two recent trends in biomedical research:

1. The exponential increases in the quantity of molecular and genetic information without concomitant advances in the understanding of biology or disease, and;

2. The need for development and validation of computational methods for integrating massive molecular and clinical datasets obtained across sizable populations into predictive disease models that can recapitulate complex biological systems with increasing accuracy.

They believe that these problems are compounded by current systems and practices for sharing datasets as well as the scale of data and computational capacity required to construct useful integrated genomic models of networks and systems.

Source: http://sagebase.org/info/index.php

The International Severe Adverse Events Consortium (iSAEC)

The international Serious Adverse Event Consortium (iSAEC) is a nonprofit organization founded in 2007. It is comprised of leading pharmaceutical companies, the Wellcome Trust, and academic institutions; with scientific and strategic input from the U.S. Food and Drug Administration (FDA) and other international regulatory bodies. The mission of the iSAEC is to identify DNA-variants useful in predicting the risk of drug-related serious adverse events (SAEs).

Patients respond differently to medicines and all medicines can have side effects in certain individuals. The iSAEC’s work is based on the hypothesis these differences have (in part) a genetic basis, and its research studies examine the impact of genetic variation on how individuals respond to a large variety of medicines. The iSAEC’s initial studies have successfully identified genetic variants associated with drug-related liver toxicity (DILI) and Serious Skin Rashes

72

Page 123: Sharing Clinical Research Data:  an IOM Workshop

Draft as of September 25, 2012

(SSR). The majority of the iSAEC’s genetic findings have been specific to a given drug versus across multiple drugs. However, a number of cross drug genetic alleles are starting to emerge that may provide important insights into the underlying biology/mechanism of drug induced SAEs (e.g. HLA*5701 or UGT1A1*28). They believe their findings clearly demonstrate an important role for the MHC genomic region (Chromosome 6), in the pathology of immunologically mediated SAEs such as DILI and SSR. And that their findings emphasize the importance of immune regulation genes, in addition to a number of well characterized drug metabolism (ADME) genes.

In the iSAEC’s second phase (2010-2013) their goal is to deepen understanding of the genetics of immunologic influenced SAEs, including SSR, DILI, drug induced kidney injury (DIKI) and acute hypersensitivity syndrome (AHSS). They will continue to apply state of the art genomic methods to fulfill these aims, including whole genome genotyping and next generation sequencing to explore the role “rare” genetic variants in drug induced SAEs In addition to supporting original genomic research on drug-related SAEs, the iSAEC is:

� Executing "open-use research practices and standards" in all its collaborations. � Encouraging greater research efficiency and “speed to results” by pooling talent and

resources, under a collaborative private sector leadership model, solely focused on research quality and productivity.

� Releasing all of its novel genetic discoveries/associations into the public domain uninhibited by intellectual property constraints.

� Enhancing the public’s understanding of how the industry, academia and government are partnering to address drug-related adverse events

Structural Genomics Consortium (SGC)

The SGC (Structural Genomics Consortium) is a not-for-profit, public-private partnership with the directive to carry out basic science of relevance to drug discovery. The core mandate of the SGC is to determine 3D structures on a large scale and cost-effectively - targeting human proteins of biomedical importance and proteins from human parasites that represent potential drug targets. In these two areas respectively, the SGC is now responsible for >25% and >50% of all structures deposited into the Protein Data Bank each year; to date (Sep.2011) the SGC has released the structures of over 1200 proteins with implications to the development of new therapies for cancer, diabetes, obesity, and psychiatric disorders.

Since its inception, the SGC has been engaged in pre-competitive research to facilitate the discovery of new medicines. As part of its mission the SGC generates reagents and knowledge related to human proteins and proteins from human parasites. The SGC believes that its output will have maximal benefit if released into the public domain without restriction on use, and has adopted an Open Access policy.

Source: http://www.thesgc.org

73

Page 124: Sharing Clinical Research Data:  an IOM Workshop

Draft as of September 25, 2012

tranSMART

Developed by J&J and launched in 2009, tranSMART is a tool for translational R&D. The

platform is heavily reliant on open source data and is designed to facilitate pre-competitive

sharing.

tranSMART is a knowledge management platform that they believe enables scientists to develop and refine research hypotheses by investigating correlations between genetic and phenotypic data, and assessing their analytical results in the context of published literature and other work.

The integration, normalization, and alignment of data in tranSMART is designed to permit users to explore data very efficiently to formulate new research strategies. Some of tranSMART's specific applications include:

• Revalidating previous hypotheses • Testing and refining novel hypotheses • Conducting cross-study meta-analysis • Searching across multiple data sources to find associations of concepts, such as a gene's

involvement in biological processes or experimental results • Comparing biological processes and pathways among multiple data sets from related

diseases or even across multiple therapeutic areas

The tranSMART Data Repository combines a data warehouse with access to federated sources of open and commercial databases. tranSMART accommodates:

• Phenotypic data, such as demographics, clinical observations, clinical trial outcomes, and adverse events

• High content biomarker data, such as gene expression, genotyping, pharmacokinetic and pharmaco-dynamics markers, metabolomics data, and proteomics data

• Unstructured text-data, such as published journal articles, conference abstracts and proceedings, and internal studies and white papers

• Reference data from sources such as MeSH, UMLS, Entrez, GeneGo, Ingenuity, etc. • Metadata providing context about datasets, allowing users to assess the relevance of

results delivered by tranSMART

Data in tranSMART is designed to allow identification and analysis of associations between phenotypic and biomarker data, and it is normalized to conform with CDISC and other standards to facilitate search and analysis across different data sources. tranSMART also allows investigators to search published literature and other text sources in order to evaluate their analysis in the context of the broader universe of reported research.

External data can also be integrated into the tranSMART data repository, either from open data projects like GEO, EBI Array Express, GCOD, or GO, or from commercially available data sources. They believe that by making data accessible in tranSMART enables organizations to leverage investments in manual curation, development costs of automated ETL tools, or commercial subscription fees across multiple research groups.

74

Page 125: Sharing Clinical Research Data:  an IOM Workshop

Draft as of September 25, 2012

The tranSMART system also incorporates role-based security protections to enable use of the platform across large organizations. User authentication for tranSMART can be integrated into an organization's existing infrastructure to simplify user management.

The tranSMART security model allows an organization to control data access and use in accordance with internal policies governing the use of research data, as well as HIPAA, IRB, FDA, EMEA, and other regulatory requirements.

Source: http://www.transmartproject.org/index.html

75

Page 126: Sharing Clinical Research Data:  an IOM Workshop

CDISC Contact: C-Path Contact: Andrea Vadakin Dianne Janis +1.316.558.0160 520.382-1399 [email protected] [email protected]

CDISC, C-Path and FDA Collaborate to

Develop Data Standards to Streamline Path to New Therapies Austin, TX – 21 June 2012 – The Clinical Data Interchange Standards Consortium (CDISC) and the Critical Path Institute (C-Path) announce the signing of a partnership agreement to establish the Coalition For Accelerating Standards and Therapies, or CFAST, an initiative to accelerate clinical research and medical product development by creating and maintaining data standards, tools and methods for conducting research in therapeutic areas that are important to public health.

Version 1.0 of an Alzheimer’s data standard, used to develop the C-Path Coalition Against Major Diseases’ (CAMD) groundbreaking Alzheimer’s Disease data repository, was developed through the CDISC process and posted to the CDISC website in October 2011. Since this first major milestone, CDISC and C-Path and such partners as the Bill and Melinda Gates Foundation, the Michael J. Fox Foundation, the PKD Foundation, Tufts University, Rochester University, the National Institutes of Health (NIH) and others have continued to make advancements in a number of therapeutic areas. Tuberculosis, Pain, Parkinson’s Disease, Polycystic Kidney Disease, Virology, Oncology, and Cardiovascular Disease are therapeutic areas for which standards are currently in development.

“We have enjoyed our collaboration to develop standards and research databases that are open to scientists around the world to seek new therapies,” says Dr. Carolyn Compton, President and CEO of C-Path. “We need a means to scale the process and manage the development of a very large number of therapeutic area standards. The establishment of CFAST embodies our resolve to take what we have learned from the initial projects to formulate a new process that will be significantly more efficient while retaining the rigor that is essential to this work.”

Also in 2011, the Center for Drug Evaluation and Research (CDER) at the U.S. Food and Drug Administration (FDA) released a list of priority therapeutic areas that could benefit from the standardization of key study data specific to each area. The FDA website states: “FDA, CDISC and the Critical Path Institute [C-Path] are collaborating on efforts to support development of therapeutic area standards. […] We encourage stakeholders to engage in and, where possible, support these data

standardization efforts.”

Therapeutic area standards development has been identified by the CDISC Board of Directors and the CDISC community as a key strategic goal for CDISC. "Our stakeholders have expressed that CDISC provides value to them through our free and open standards and that this value will be enhanced if we can extend the foundational standards to cover therapeutic areas," stated Dr. Rebecca Kush, CDISC

76

Page 127: Sharing Clinical Research Data:  an IOM Workshop

President and CEO. "CFAST is a major new initiative that will allow the industry to speed the delivery new therapies for patients, both in helping us gain more knowledge from past research and in improving the overall process for new studies, from protocol design through analysis and reporting."

The official launch of CFAST, including the roll out of the newly designed process and a Shared Health and Research Electronic Library (SHARE) environment to make the standards more accessible, will be on 24 October at the CDISC International Interchange in Baltimore, MD. To learn more about these therapeutic area standards, interested parties are invited to attend the session “Standards for Patients: Collaborations to Innovate Therapy Development,” chaired by Bron Kisler, CDISC VP of Strategic Initiatives, taking place at the DIA Annual Meeting in Philadelphia on Monday morning, 25 June 2012.

ABOUT CDISC CDISC is a 501(c)(3) global non-profit charitable organization, with ~300 supporting member organizations from across the clinical research and healthcare arenas. Through the efforts of volunteers around the globe, CDISC catalyzes productive collaboration to develop industry-wide data standards enabling the harmonization of clinical data and streamlining research processes from protocol through analysis and reporting, including the use of electronic health records to facilitate the collection of high quality research data. The CDISC standards and innovations can decrease the time and cost of medical research and improve quality, thus contributing to the faster development of safer and more effective medical products and a learning healthcare system. The CDISC Vision is to inform patient care and safety through higher quality medical research. ABOUT C-PATH An independent, non-profit organization established in 2005 with public and private philanthropic support from the Arizona community, Science Foundation Arizona, and the U.S. Food and Drug Administration (FDA), C-Path’s mission is to improve human health and well-being by developing new technologies and methods to accelerate the development and review of medical products. An international leader in forming collaborations, C-Path has established global, public-private partnerships that currently include 1,000+ scientists from government regulatory agencies, academia, patient advocacy organizations, and dozens of major pharmaceutical companies. C-Path is headquartered in Tucson, AZ and has an office in Rockville, MD. Visit www.c-path.org for more information.

# # #

77

Page 128: Sharing Clinical Research Data:  an IOM Workshop

Embargoed for Contact: Rick Myers [email protected] 12:01 a.m. 520-547-3440 Friday, June 11, 2010 Lisa Romero [email protected]

520-777-2883

PUBLIC RELEASE OF ALZHEIMER’S CLINICAL TRIAL DATA BY PHARMACEUTICAL RESEARCHERS First Combined Pharmaceutical Trial Data on Neuro-degenerative Diseases;

Shared Resource from Unique Public-Private Partnership Will Help Accelerate Alzheimer’s, Parkinson’s, and Other Brain Disease Research

Washington, DC – A new database of more than 4,000 Alzheimer’s disease patients who have participated in 11 industry-sponsored clinical trials will be released today by the Coalition Against Major Diseases (CAMD). This is the first database of combined clinical trials to be openly shared by pharmaceutical companies and made available to qualified researchers around the world. It is also the first effort of its kind to create a voluntary industry data standard that will help accelerate new treatment research on brain disease, as patients with other related brain diseases are expected to be added. The level of detail and scope of this database will enable researchers to more accurately predict the true course of Alzheimer’s, Parkinson’s, Huntington’s, and other neuro-degenerative diseases, thereby enabling the design of more efficient clinical trials. Patient identifiers will not be included in the database, thereby ensuring patient privacy. CAMD is a formal consortium of pharmaceutical companies, research foundations and patient advocacy/voluntary health associations, with advisors from government research and regulatory agencies including the U.S. Food and Drug Administration (FDA), the European Medicines Agency (EMA), the National Institute of Neurological Disorders and Stroke (NINDS), and the National Institute on Aging (NIA). CAMD is led and managed by the non-profit Critical Path Institute (C-Path), which is funded by a cooperative agreement with the FDA and a matching grant from Science Foundation Arizona. “The U.S. Food and Drug Administration has supported and actively participated in this innovative and unprecedented public-private partnership from its inception,” said Joshua Sharfstein, MD, Principal Deputy Commissioner, FDA. “The agency is strongly committed to CAMD and other regulatory science collaborations that can speed safe and effective treatments to the public.” In addition to sharing data, the pharmaceutical members of CAMD have agreed to use the new common data standard established for Alzheimer’s disease by the standard-setting organization, CDISC, in their future submissions for drug approvals. The Clinical Data Interchange Standards Consortium (CDISC) is a global, multidisciplinary, non-profit organization that has established standards to support the acquisition, exchange, submission, and archive of clinical research data and metadata. CDISC standards are vendor-neutral, platform-independent, and freely available via the CDISC website. This will add greater efficiencies to the FDA’s review process and make it possible for new products to reach the market more quickly and with greater assurances of safety and effectiveness.

78

Page 129: Sharing Clinical Research Data:  an IOM Workshop

“This unprecedented data sharing is game-changing for companies that are developing new therapies for neuro-degenerative diseases,” said Raymond Woosley, MD, PhD, President and CEO of Critical Path Institute (C-Path). “Scientists around the world will be able to analyze this new combined data from pharmaceutical companies, add their own data, and consequently better understand the course of these diseases.” Mark McClellan, MD, PhD, who launched FDA’s Critical Path Initiative during his tenure as FDA Commissioner, also noted the need for better evidence. “Too many treatments fail in the last stages of research, wasting millions of dollars and years of research time. To get to faster, more efficient development of safe and effective treatments, we must have a better understanding of diseases at the molecular level. The CAMD database is a promising step in this process for neurodegenerative diseases,” said Dr. McClellan, who is now the director of the Engelberg Center for Health Care Reform and Leonard D. Schaeffer Chair in Health Policy Studies at the Brookings Institution. Roughly 6.5 million people in the U.S. are afflicted with Alzheimer’s and Parkinson’s diseases, with costs reaching as much as $175 billion annually. Worldwide there are already an estimated 30 million people with dementia alone. By 2050, the number will rise to over 100 million. Halting or slowing the progression of these diseases will prevent untold suffering and save tens of billions of dollars every year. “Data sharing is the backbone of several CAMD projects designed to identify patients who might develop brain diseases, i.e., before symptoms are apparent,” said Marc Cantillon, MD, Director of C-Path’s Coalition Against Major Diseases. “Our goal is to develop tools to prevent or slow these diseases so patients can maintain independence and quality of life.” The CAMD database will allow researchers to design more efficient clinical trials that have the maximum chance of demonstrating if a new treatment is truly safe and effective. In addition, the coalition is identifying biomarkers that identify patients in the very early stages of Alzheimer’s disease and Parkinson’s disease. According to Maria Isaac, MASc, MD, PhD, Scientific Administrator, Scientific Advice, Human Medicines Special Areas Sector of the European Medicines Agency (EMA), “Within the context of the Innovative Medicines Initiative (IMI) in Europe, the EMA is committed to similar goals as C-Path’s consortia, i.e., to help biopharmaceutical drug development, for the benefits of patients. The Agency is especially interested in reviewing CAMD’s Alzheimer’s biomarkers and disease progression models.” “We are proud to be a member of this coalition,” said Frank Casty, MD, VP Technical Evaluations, AstraZeneca Pharmaceuticals LP, and Co-Director of CAMD. “A healthier world must come from collaboration, in making better, deeper connections with all our stakeholders, and sharing skills and ideas to meet a common goal – improved health.” ____________________________________________________________________________________ CAMD Members: Abbott, Alliance for Aging Research, Alzheimer's Association, Alzheimer's Foundation of America, AstraZeneca Pharmaceuticals LP, Bristol-Myers Squibb Company, CHDI Foundation Inc, Eli Lilly and Company, F. Hoffmann-La Roche Ltd, Forest Research Institute, Genentech Inc., GlaxoSmithKline, Johnson & Johnson, National Health Council, Novartis Pharmaceuticals Corporation, Parkinson's Action Network, Parkinson's Disease Foundation, Pfizer, Inc., and sanofi-aventis US Inc. About CAMD: CAMD members fully share pre-competitive data and knowledge that will more efficiently and safely speed development of new therapies and preventions for Alzheimer’s, Parkinson’s, Huntington’s, and other debilitating neuro-degenerative diseases. CAMD’s overall objective is to help scientists identify clinical and laboratory characteristics of patients who are pre-symptomatic and most likely to benefit from new therapies. For more information on CAMD and the database, visit http://www.c-path.org/CAMD.cfm.

About Critical Path Institute (C-Path): An independent, non-profit organization, C-Path’s mission is to serve as the impartial facilitator of collaborative efforts among scientists from government, academia, patient advocacy organizations, and the private sector to support the U.S. Food and Drug Administration’s regulatory science initiatives.

79

Page 130: Sharing Clinical Research Data:  an IOM Workshop

This involves creating faster, safer, and smarter pathways for innovative new drugs, diagnostics, and devices that will significantly improve public health. Established in 2005, C-Path is headquartered in Tucson, Arizona, with offices in Phoenix, Arizona, and Rockville, Maryland. Visit www.c-path.org for more information.

###

80

Page 131: Sharing Clinical Research Data:  an IOM Workshop

Get Smart

Knowledge Management

Winner: Centocor R&D nominated by Recombinant Data Corp.

Project: tranSMART

By Kevin Davies

July 29, 2010 | One of the most gratifying aspects of awarding a Best Practices Award to a major

pharmaceutical operation is to see that technology or platform find applications beyond the walls

of the pharma itself. That certainly seems to be the case with tranSMART, the research data

management project that was implemented at the Johnson & Johnson (J&J) subsidiary Centocor

R&D, with help from Recombinant Data Corporation in Boston.

As Bio•IT World reported earlier this year (see, “Running tranSMART for the Drug

Development Marathon,” Bio•IT World, Jan 2010), tranSMART provides a federated

connectivity architecture that allows Centocor researchers to query the company’s wealth of drug

data using a single tool. The tranSMART architecture minimizes the complexity and resources

required to aggregate disparate research data, improving the productivity of the company’s

researchers.

Leading the charge for Centocor was Eric Perakslis who, just days after accepting the Best

Practices Award, took up a new role as J&J’s vice president of Pharma R&D IT (a position

formerly held by John Reynders, who has taken a different role at J&J). Indeed, Perakslis says “a

lot of the innovation we tried to do [with tranSMART] were things that made them seek me out

for this job.”

The impetus behind tranSMART was the notion that modern drug development is expensive,

tedious, and very data intensive. tranSMART has a search engine that indexes many sources of

internal and external data, as well as links for further analysis by applications such as Ariadne

Genomics Pathway Studio. Some text sources are hand-curated, such that the hits from these

curated sources can be exported in semantic triples and visualized in network diagrams.

A Dataset Explorer allows users to easily query clinical trial data and focus on scientific

questions rather than data processing, generating queries by phenotypes, genotypes, or a

combination thereof. End users can view summary statistics about their queries, analyze gene

expression or proteomic data through a link to the Broad Institute’s Gene Pattern application or

view SNP data using linkage disequilibrium plots.

Both the search engine and the Dataset Explorer leverage the same underlying data warehouse—

the Recombinant Data Trust. tranSMART reduces the delay between a query across disparate

source systems and a result from days or weeks to a matter of minutes. Within about 12 months,

Perakslis and colleagues had tranSMART deployed, including a variety of open-source

components to help other institutions leverage the knowledge base. That is already happening,

says Perakslis.

81

Page 132: Sharing Clinical Research Data:  an IOM Workshop

One example is the European Union’s Innovative Medicines Initiative (IMI), designed to

eliminate hurdles in drug development. IMI has funded U-BIOPRED, involving thousands of

respiratory disease patients, looking for drug targets and repositioned molecules, involving

multiple biopharmas, academic groups and regulators coming together. Perakslis says J&J was

eager to get involved, and has granted tranSMART to the consortium. Another outreach effort is

with St. Jude Children’s Research Hospital in Memphis, integrating tranSMART into the Cure

for Kids and pediatric oncology database for 172 countries. “It’s great to see people taking it up,”

says Perakslis, particularly as it is a “slightly disruptive innovation” for a big pharma. “It’s not

just free software, but you can tailor your investment to make it viral and useful, rather than just

buying licenses and getting it to work.”

Another pharma has approached Perakslis about using tranSMART as a large, public repository

for cancer data. For J&J, “It’s good business,” says Perakslis. “If you want a friend, be a friend.”

J&J isn’t looking for anything in return, but “If some of these foundations find a drug target or

an interesting compound, and they care to show it to us first, it doesn’t bother us!”

This article also appeared in the July-August 2010 issue of Bio-IT World Magazine.

http://www.bio-itworld.com/2010/issues/jul-aug/BP-KM.html

82

Page 133: Sharing Clinical Research Data:  an IOM Workshop

Running tranSMART for the Drug

Development Marathon

By Kevin Davies

January 20, 2010 | If you sense a certain irresistible Lance Armstrong-like determination when

meeting Centocor’s Eric Perakslis, it is not a coincidence. Perakslis, the VP of R&D Informatics

at Centocor R&D (and newly appointed member of the Corporate Office of Science and

Technology at J&J) is a cancer survivor—he was diagnosed with stage III kidney cancer at 38—

and has run marathons in support of Armstrong’s LiveStrong Foundation.

Once every other month, Centocor graciously lets Perakslis travel to Jordan where he volunteers

as the CIO and head of Biomedical Engineering of the King Hussein Institute of Biotechnology

and Cancer. It’s a noble gesture given that Perakslis has his hands full at Centocor R&D—the

biotechnology, immunology and oncology research arm of Johnson & Johnson (J&J). Trained as

a biochemical engineer, Perakslis has spent the past decade in handling a variety of IT and lab

responsibilities. “You’ve got to understand yours skills and weaknesses and then just kind of go

where the fun is,” he says.

Amazingly Advanced

The latest Perakslis project has been to spearhead the development of tranSMART (Translational

Medicine Mart), a powerful new translational informatics enterprise data warehouse that went

live at J&J in June 2009. TranSMART helps investigators mine drug target, gene and clinical

trial data to aid in predictive biomarker discovery, chiefly in immunology and oncology.

Perakslis says it is an “amazingly advanced” data warehouse that compares favorably with many

such efforts he’s seen in the pharma world.

TranSMART is a full translational medicine warehouse, says Perakslis. When he was

considering the project, Perakslis insisted that a different approach was needed. “If I’m going to

be the head of informatics, all the data has to be mine—I don’t care where it is, what [J&J

subsidiary] it’s in, I don’t care if its drug discovery or clinical or post-marketing—in order to do

informatics well, all the data has to be in scope.” Few pharmas follow that practice. “When I

started recruiting, I could find some good bioinformaticists, but none of them had ever seen

clinical data—because they weren’t allowed!” It also meant changing the mindset that sees some

R&D leaders more worried about their peers seeing, and potentially misunderstanding their data,

than a rival pharma executive.

Breaking down those silos is like developing an internal informatics-without-borders ecosystem.

“We take all the resources and capabilities and direct it toward the largest questions facing the

organization.” Instead of a traditional bioinformatician only supporting a process like target

identification, Perakslis puts them all in one team to apply some scientific and medical rigor to

the key strategic decisions facing the development teams and R&D leadership.

The goal is an informatics package that combines all the different ‘omics technologies and

83

Page 134: Sharing Clinical Research Data:  an IOM Workshop

informatics disciplines to support or guide a decision. “It is still early days and, at first, we added

a lot of value by ruling things out. I can’t tell [the team leader] what indications to pick, but out

of five disease indications we’re considering for a novel therapeutic, I have evidence that three

are a bad idea. How’s that to start? We are then able to stop debating the five and focus on

evolving the thinking about the two that were left.”

Perakslis partnered with Recombinant Data to build tranSMART. He wanted a partner who knew

how to assemble data warehouses. Head architect Bruce Johnson worked on clinical data

warehousing at the Mayo Clinic, but before that, he worked on railroads. “He just understands

data warehousing and doesn’t care what sort of data it is,” says Perakslis. The Boston firm has

worked with Partners Healthcare in designing and building i2B2. “Technology-wise,

tranSMART is very basic. We haven’t brought in a lot of third-party software or heavy analytics.

The system is about the data and the technology is based on open stuff. I don’t owe people

millions of dollars a year in contracts.” The closest commercial competition is probably

InforSense, says Perakslis. In addition, the system is the first production system from J&J to be

hosted externally on the Amazon cloud.

“IT isn’t always a profession that gets the respect it deserves. Sometimes, the MD on staff thinks

he can do IT better than the IT staff, [but] I wanted to assemble a team that knew it down cold.”

For example, Sandor Szalma, the project manager for tranSMART, brings the needed mix of

R&D knowledge, vision, software and social engineering experience and tenacity to the team.

Perakslis is currently having discussions about a Red Hat approach where a lot of the advanced

analytics is made open source. “I’m not running a commercial software company,” he says. He’s

already offered the software to Guna Rajagopal, a colleague at the Cancer Institute of New

Jersey. In addition, St Jude Children’s Research Hospital, UCSF and other centers are all

considering the adoption of the i2b2-based platform.

Unlike another prominent and excellent J&J informatics resource, ABCD (see, “ABCD: The

Relentless Pursuit of Perfection,” Bio•IT World, Nov 2007), Perakslis says is very focused on the

chemistry and computational technologies, tranSMART handles large molecules and everything

else, capable of providing access to every trial the company has ever run, and allowing clinicians

to test hypotheses that enable the design of better clinical trials. In an effort to reinvent as little as

possible, tranSMART integrates information from numerous existing vendors at Centocor/J&J,

including Jubilant, Ariadne, Ingenuity and GeneGo. There are links to public repositories such as

Gene Cards and PubMed as well as direct links to many key internal systems across J&J.

Perakslis has loaded several years of clinical trial data into tranSMART, which can be searched

for screening data, lab data, you name it. “We have not had a lot of success in asthma yet but we

feel the disease is strategic and we must keep trying. If the PI in asthma says, ‘I just saw a new

signature or a new gene in a paper. I wonder if that gene was amplified in any of the patients

we’ve seen in our studies?’ That’s an instant search.”

All Kinds of Data

The data warehouse can be searched by almost anything—compound, disease, gene, gene list,

pathway, gene signature. Type in a gene, and a researcher can see every relevant clinical trial,

gene expression datasets where expression of that gene was significantly altered, literature

84

Page 135: Sharing Clinical Research Data:  an IOM Workshop

results, pathway analysis, and more.

Perakslis says he is incorporating all kinds of data into the warehouse. “When we add new trials,

we use data diversity as a selection criteria. If a certain trial will bring a new and necessary data

modality, great. We’re growing the mart and growing the content.” While it is meaningless and

too expensive to do a whole genome on everybody right now, he says most trials are

underpowered with respect to sampling for molecular profiling. “So what is just right? What is

enough data from a trial to meet objectives but learn something about the biology or the

compound that helps you design the next one in the event this one fails?”

Since tranSMART went live last summer, the accolades have been glowing, particularly from the

clinicians and scientists who have changed their study designs because of the data warehouse.

For example, J&J’s Elliot Barnathan, executive director of clinical immunology research, says,

“tranSMART gives the pharmaceutical physician/researcher the ability to use data already

obtained internally or from the public domain at his or her desktop to ask and quickly answer

questions with ease.” Hans Winkler, senior director of oncology biomarkers, says tranSMART

will transform the way his team analyzes data. “Cross-trial meta-analyses and combined pre-

clinical and clinical data analyses are at our fingertips,” he says. Direct cross-referencing with

the literature and multiple visualization tools down-stream are all available instantly. This is

clearly a quantum leap in our ability to extract knowledge from data.”

A big reason for tranSMART’s success is the social engineering—getting the datasets released,

having a QA process, informed consent, etc. As a past CIO for Centocor R&D, Perakslis had

written a lot of those policies, on records management, the FDA Gateway project, eCTD, and

knew where a lot of that stuff was. But he admits: “The truth is, there’s nothing you could do in

this warehouse you couldn’t do anyway if you had the time to do it.”

Perakslis is trying to encourage the Google approach, spending a portion of one’s time thinking

creatively about something other than your job. “In pharma, when things are tough and people

get busy, they tend to start acting like technicians. They run the models they know how to run…

This is about efficiency and innovation. It does save time and money—if people are thinking

differently and running better experiments, the quality of decision making should go up.”

Perakslis praises J&J CIO John Reynders for being supportive and “getting out of the way and

removing barriers, that’s one of the best things he could do.” As good a tool as tranSMART is,

Perakslis says, “it’s only good because we built all our processes around it. If you dropped that in

an organization, if you don’t have central agreement about data standards, data availability,

compliance and protection of patient rights, it’s not any use.”

This article also appeared in the January-February 2010 issue of Bio-IT World Magazine.

http://www.bio-itworld.com/issues/2010/jan/transmart.html

85

Page 136: Sharing Clinical Research Data:  an IOM Workshop

Genomic research

Consent 2.0

NEW YORK

Abetterway of signingup for studies of your genes

TN AN age where people promiscuouslyIpost personal data on the web and regu-larþ click "I agTee" to reams of legalesethey have never read, news of yet anotherelectronic consent form might seem like a

big yawn. But for the future of genomics-related research the Portable Legal Con-sent, to be announced shortlyby Sage Bio-networks, a non-profit research organisa-tion based in Seattle, is anything butmundane. Indeed, by reversing the normalway consent to use personal data is ac-quired from patients in clinical trials, itcould spell a new relationship betweenscientists and the human subjects of theirresearch, with potential benefrts that ex-tend well beyond genomics.

ShareandenjoyThe heart of Portable Legal Consent is thenotion that anyone who signs up.for a clin-ical trial, or simply has his genome read inorder to anticipate the risk of disease,should easily be able to share his genomicand health data not just with that researchgroup or company, but with all scientistswho are prepared in turn to accept somesensible rules about how they may use thedata. The main one of these is that the re-sults of investigations which include such"open source" data must, themselves, befreely and publicly available. In much thesame vein as the open-source-softwaremovement, the purpose of this is to in-crease the long-term value of the data, byallowing them to be reused in ways thatmay not even have been conceived of atthe time they were collected.

That approach contrasts with today'ssystem of secretive data silos for particularstudies of speciflc diseases. Patients some-times discover that they have even signedaway their rights to see their own data.

That may serve the narrow interests ofa research group or drug company intenton keeping competitors at bay. But the po-tential of genomic data to provide furtherinsights, perhaps in completely novel con-texts, is huge-especially when correlatedwith a person's medical record. Also, teas-ing out correlations between particular ge-

notypes and diseases in a statisticallymeaningful way requires large sets of data;the larger the set, the more believable thecorrelations. Portable Legal Consent bringsthe promise of very large data sets indeed.

Even for academia and industry, thesebenefrts should eventually outweigh theshort-term drawbacks of sharing. Indeed,

according to John Wilbanks, the crea[or ofPortable Legal Consent, representatives ofseveral drug companies have expressedenthusiasm about Sage Bionetworks' ap-proach. And appealing features, such as so-

called syndicating technology, which auto-matically informs both researchers andvolunteers about new data relevant to aspecific drug or disease, should reduce theresistance of individual researchers to theloss of control of what they used to thinkof as their own data.

Of course, sharing in this open-endedway carries risks. The data involved are

"de-identified"-meaning they cannot im-mediately be traced to a specific individ-ual. But as Mr Wilbanks notes, this is not afoolproof guarantee of anonymity.

Data shared might, for example, betraced back to their owner by sophisticat-ed search algorithms. Or some malevolenthacker might expose them to the world.Those squeamish about sharing their per-sonal information should probably notsign the consent form, Mr Wilbanks coun--sels. But those who believe the benefrt ofdoing so-accelerating the pace of medicalresearch-outweighs the risks can start topool their own data next month on a spe-

cial website: weconsent.us.To make sure consent is truly informed,

Mr Wilbanks and his team have gone togreat lengths to explain the consequencesto signatories. There is an online tutorialthat cannot be bypassed. Uploaded infor-mâtion may be removed from the data-base on request, at any time, but the pro-vider is clearþ warned that it may havealready found its way into places fromwhich it cannot be erased.

Sage Bionetworks hopes z5,ooo peoplewill sign up in the frrst year, either becausetrial organisers choose to adopt the proto-col or volunteers insist on it. But to be reallyuseful, the database would need to grow toten or 1oo times this size. Mr Wilbanks hastherefore started discussions with severalfrrms that offer commercial genetic testsfor a range of diseases. It is also linking upwith PatientslikeMe, an organisation thathelps almost rio,ooo people find thosewith similar illnesses, in order to sharetheir experiences.

Are we all agreed?So far, the Portable Legal Consent is validonly in America, although Sage Bionet-works is looking at ways of adapting it to fitthe legal frameworks of China and theEuropean Union. How quickly the ideawill catch on iemains to be seen. But if itdoes, other sorts of researchers who relyon gathering personal data-for examplein sociology or in tracking energy use inhomes-may find it attractive. And thatwould enable research of a sort that is nowimpossible, by opening up the field ofquantifi able social science. r

Sign here

86

Page 137: Sharing Clinical Research Data:  an IOM Workshop

492 Bulletin of the World Health Organization | June 2008, 86 (6)

Reporting the findings of clinical trials: a discussion paperD Ghersi,a M Clarke,b J Berlin,c AM Gülmezoglu,a R Kush,d P Lumbiganon,e D Moher,f F Rockhold,g I Simh & E Wager i

Perspectives

a World Health Organization, Geneva, Switzerland.b UK Cochrane Centre, Oxford, England.c Johnson & Johnson, Titusville, NJ, United States of America.d Clinical Data Interchange Standards Consortium (CDISC), Austin, TX, USA.e Khon Kaen University, Khon Kaen, Thailand.f Children’s Hospital of Eastern Ontario, Ottawa, ON, Canada.g GlaxoSmithKline, Philadelphia, PA, USA.h University of California, San Francisco, CA, USA.i Sideview, Princes Risborough, Buckinghamshire, England.Correspondence to Davina Ghersi (e-mail: [email protected]).doi:10.2471/BLT.08.053769(Submitted: 4 April 2008 – Accepted: 8 April 2008 – Published online: 21 April 2008 )

BackgroundWhen researchers embark on a clinical trial, they make a commitment to con-duct the trial and to report the findings in accordance with basic ethical prin-ciples. This includes preserving the ac-curacy of the results and making both positive and negative results publicly available.1 However, a significant pro-portion of health-care research remains unpublished and, even when it is pub-lished, some researchers do not make all of their results available.2 Selective reporting, regardless of the reason for it, leads to an incomplete and potentially biased view of a trial and its results.3

The consequences of publication bias and selective reporting have gained the attention of health-care consum-ers, the media and politicians. All have recognized the impact that undisclosed results can have on the ability of pa-tients, practitioners and policy-makers to make well-informed decisions about health care. Concern over the underre-porting of adverse events, in particular, has increased the demand for more transparent processes for registering clinical trials and reporting their find-ings. In recent years, trial registration has become increasingly widely ac-cepted and implemented. There are now well-established mechanisms to register trials, assign unique identifiers and make this information publicly available. WHO’s International Clinical Trials Registry Platform Search Portal helps bring the data from these regis-ters together, making it much easier to search for the existence of a trial

(available at: http://www.who.int/trialsearch).

The value of registration goes far beyond the administrative benefits of having a complete collection of all trials. Trial registers may facilitate re-cruitment into clinical trials by raising awareness of their existence among potential participants and health-care practitioners. They may also lead to more ethical and successful research by avoiding the unintentional dupli-cation of research already under way elsewhere. From the perspective of this paper, however, the greatest benefit of trial registration is enhanced transpar-ency; that is, making it clear which tri-als are being conducted so that people can anticipate their results.

The next step to informed deci-sion-making is to make the findings of clinical trials available, since it is knowledge of the findings, rather than of the existence of a trial, that is likely to have the greatest impact on people trying to choose between alternative in-terventions. The arrival and growth of electronic publishing and the Internet as dissemination tools without page or length restrictions has greatly ex-panded the ability of people to make findings available and accessible in full. The recognition of the need for reliable evidence to improve health care and to facilitate the synthesis of the results of research into systematic reviews has fuelled the demand for ac-cess to the findings of all research, as have the needs of the numerous other stakeholders in clinical research.

A proposed positionThe position proposed by the mem-bers of the WHO Registry Platform Working Group on the Reporting of Findings of Clinical Trials is that “the findings of all clinical trials must be made publicly available”. This paper discusses the principles underlying this position. Our goal is to contribute to the ongoing debate and to foster the collaboration that is necessary to en-sure that the findings of clinical trials do not remain hidden from the people who need access to them.

What is a finding?The language used to discuss the re-porting of clinical trials usually focuses on “results” – often taken to mean the numerical results of an analysis for a specific outcome. For example, a sum-mary estimate such as a relative risk. However, the reader or user of research also needs background information that will allow them to correctly inter-pret the results for specific outcomes. The type and amount of information required will depend on the nature of the audience and how it will use the information received but will have four key elements:

1. Methodological contextThe user needs to know what the origi-nal plans for the trial were and how it was actually conducted. Some of the original plans will be available if the trial was registered, but more complete

87

Page 138: Sharing Clinical Research Data:  an IOM Workshop

PerspectivesReporting the findings of clinical trials

493Bulletin of the World Health Organization | June 2008, 86 (6)

Davina Ghersi et al.

ReferencesDeclaration of Helsinki: ethical principles for medical research involving 1. humans. Ferney-Voltaire, France: World Medical Association; 2000. Available from: http://www.wma.net/e/policy/b3.htm [accessed on 9 April 2008].Song F, Eastwood AJ, Gilbody S, Duley L, Sutton AJ. Publication and related 2. biases. Health Technol Assess 2000;4 (10).Chan AW, Hrobjartsson A, Haahr MT, Gotzsche PC, Altman DG. Empirical 3. evidence for selective reporting of outcomes in randomized trials: comparison of protocols to published articles. JAMA 2004;291:2457-65. PMID:15161896 doi:10.1001/jama.291.20.2457

Horton4. R. The rhetoric of research. BMJ 1995;310:985-7. PMID:7728037Greenhalgh5. T. Commentary: scientific heads are not turned by rhetoric. BMJ 1995;310:987-8. PMID:7728038Sim I, Chan AW, Gulmezoglu M, Evans T, Pang T. Clinical trial registration: 6. transparency is the watchword. Lancet 2006;367:1631-3. PMID:16714166 doi:10.1016/S0140-6736(06)68708-4

information should be in the full trial protocol. The actual conduct of the trial might, however, differ from that planned. Such differences may be valid but should be traceable, preferably via a clinical trial registry. The methodologi-cal context for a trial might also include information on the primary question that the researchers set out to answer; the hypothesis they were seeking to test; the methods they used to allocate or select participants for the interventions; whether any blinding or masking was done; and a description of the statistical methods planned and used, including a justification of the sample size.

2. Population contextThe user needs to consider to whom the results of the trial can be applied. They need to know the characteristics of the population it was initially intended to recruit, as well as the characteristics of the population who were recruited. In reporting their findings, the researchers should therefore refer to the original inclusion and exclusion criteria and pro-vide information on the population that actually took part, describing the extent to which participants left the trial early (with reasons for doing so) and whether interventions were delivered as planned.

3. A result for each outcomeA clinical trial will usually consider several outcomes, each of which might be measured in multiple ways or at multiple time points, and there might be more than one way to analyse each of these measures. The findings should contain the results for every outcome measure and analysis specified in the register entry for the trial, and set out in the protocol or statistical analysis plan. If any of these results are not reported, a reason should be provided for their absence. Any findings from analyses that were not pre-specified should be clearly identified as such.

4. InterpretationThe most contentious aspect of report-ing the findings of a clinical trial may be whether the report should include the researchers’ opinions and conclu-sions. It is in this section of a report where the objectivity of the research is confronted by the subjectivity of authors who “use their power as own-ers of their writing to emphasize one point of view more than another”.4 Alternatively, linguistic spin is consid-ered by some to be essential to scientific communication, being the opportunity for scientists to speculate and formulate new hypotheses.5 Distinguishing opin-ions and conclusions from advocacy or promotion can be difficult, and it may be preferable to leave the users of the findings to draw their own conclu-sions and form their own opinions of what the findings of a trial mean for them and their decision-making.

Public availabilityClinical trial results should be available to everyone, regardless of where they live. If the results are not made avail-able within a reasonable period, the reasons for delay and a date by which the findings will be available should be submitted to the relevant clinical trials registry. It should be the responsibility of researchers to produce their findings in a format that can be accessed by potential users, and it should be the responsibility of the user to seek these results. The Internet could provide the main means by which this shared re-sponsibility can be satisfied. Access also needs to be considered in terms of the end user’s ability to both view and un-derstand the information obtained.

Historically, access to the results of a trial has usually been achieved through publication in a peer-reviewed journal. This traditional publication model has its limitations, particularly

in an environment where the end users of research information now include health-care policy-makers, consumers, regulators and legislators who want rapid access to high quality informa-tion in a “user-friendly” format. In future, researchers may be legally re-quired to make their findings publicly available within a specific timeframe (assuming any legislation created does not have escape clauses built in). In the United States of America, such legislation is already in place (available at; http://www.fda.gov/oc/initiatives/HR3580.pdf ). This may compromise the ability of researchers to publish trial findings in a peer-reviewed journal. Although some journal editors have acknowledged the changing climate around results registration and report-ing (available at: http://www.icmje.org/clin_trial07.pdf ), they may have a conflict of interest in that they will probably want the key (and potentially most exciting) messages from a trial to appear first, and perhaps exclusively, in their publication.

ConclusionPeople making decisions about health care need access to knowledge derived from the findings of clinical trial re-search. Following on from WHO’s position that all clinical trials must be registered,6 it is proposed that the findings of all clinical trials must be made publicly available. This report is the start of a consultation process on how this goal of transparency can be achieved, with the intention that greater accessibility to the findings of all clinical trials will lead to improve-ments in health and health care. To contribute to the first phase of the con-sultation, please visit: http://www.who.int/ictrp/results/consultation ■

Competing interests: None declared.

88

Page 139: Sharing Clinical Research Data:  an IOM Workshop

L ate in May, the direct-to-consumer gene-testing company 23andMe proudly announced the impending award of its first patent. The firm’s research on

Parkinson’s disease, which used data from sev-eral thousand customers, had led to a patent on gene sequences that contribute to risk for the disease and might be used to predict its course. Anne Wojcicki, co-founder of the company, which is based in Mountain View, California, wrote in a blog post that the patent would help to move the work “from the realm of academic publishing to the world of impacting lives by preventing, treating or curing disease”.

Some customers were less than enthusiastic. Holly Dunsworth, for example, posted a com-ment two days later, asking: “When we agreed to the terms of service and then when some of us consented to participate in research, were we consenting to that research being used to patent genes? What’s the language that covers that use of our data? I can’t find it.”

The language is there, in both places. To be fair, the terms of service is a bear of a docu-ment —  the kind one might quickly click past while installing software. But the consent form is compact and carefully worded, and approved by an independent review board to lay out clearly the risks and benefits of participating in research. “If 23andMe develops intellectual property and/or commercializes products or services, directly or indirectly, based on the results of this study, you will not receive any compensation,” the document reads.

The example points to a broad problem in research on humans — that informed consent is often not very well informed (see ‘Reading between the lines’). Protections for participants have been cobbled together in the wake of past controversies and have always been difficult to uphold. But they are proving even more prob-lematic in the ‘big data’ era, in which biomedi-cal scientists are gathering more information about more individuals than ever before. Many studies now include the collection of genetic data, and researchers can interrogate those

data in a growing number of ways. Several US states, including California, are consider-ing laws that would curtail the way in which researchers, law-enforcement officials and pri-vate companies can use a person’s DNA.

The research coordinators who develop consent forms cannot predict how such data might be used in the future, nor can they guarantee that the data will remain protected. Many people argue that participants should have more control over how their data are used, and efforts are afoot to give them that control. Researchers, meanwhile, often bristle at the added layers of bureaucracy wrought by the protections, which sometimes provide no real benefits to the participants. The result is a mess of opinions and procedures that sow confusion and risk deterring people from par-ticipating in research.

“A lot of times researchers will say, ‘Why can’t we just go back to the way it was?’, which was basically that we take these samples and people do it for altruistic reasons and every-thing’s lovely,” says Sharon Terry, president of the patient-advocacy group Genetic Alliance in Washington DC. “That worked in a prior age. I don’t think it works today.”

The concept of informed consent was first set out in the Nuremberg Code, a set of research-ethics principles adopted in the wake of revelations of torture by Nazi doctors during the Second World War. But in recent years, a series of mishaps over consent have

undermined support for research. In 2004, for example, scandal erupted in the United Kingdom after parents found out that from the late 1980s to the mid-1990s doctors and researchers had removed and stored organs and tissues from patients — including infants and children — without parental consent. New laws were passed that required explicit consent for such collections.

Then, in 2010, the Havasupai tribe of Ari-zona won a US$700,000 settlement against Arizona State University in Phoenix. Indi-viduals believed that they had provided blood for a study on the tribe’s high rate of diabetes, but the samples had also been used in mental-illness research and population-genetics stud-ies that called into question the tribe’s beliefs about its origins. In the settlement, the uni-versity’s board of regents said that it wanted to “remedy the wrong that was done”.

The cases illustrate the divide between researchers and the public over what people need to know before agreeing to participate in research.

Many of the recent concerns over consent are driven by the rapid growth of genome analysis. Decades ago, researchers weren’t able to glean much information from stored tissue; now, they can identify the donor, as well as his or her susceptibilities to many diseases. Researchers try to protect the genetic data through technological and legal mechanisms, but both approaches have weaknesses.

AS RESEARCHERS FIND MORE USES FOR DATA, INFORMED CONSENT HAS BECOME A SOURCE OF CONFUSION.

SOMETHING HAS TO CHANGE.B Y E R I K A C H E C K H A Y D E N

A BROKEN CONTRACT

3 1 2 | N A T U R E | V O L 4 8 6 | 2 1 J U N E 2 0 1 2

FEATURENEWS

© 2012 Macmillan Publishers Limited. All rights reserved

89

Page 140: Sharing Clinical Research Data:  an IOM Workshop

It is not enough to stripping out any information that would identify the donor, such as names and full health records, before the data are stored. In 2008, geneticists showed that they could easily identify individuals within pooled, anonymized data sets if they had a small amount of identified genetic informa-tion for reference (N. Homer et al. PLoS Genet 4, e1000167; 2008). And it may become pos-sible to identify a person in a public database from other information collected during a study, such as data on ethnic background, loca-tion and medical factors unique to the study participants, or to predict a person’s appearance from his or her DNA.

Even legal mechanisms have vulnerabilities. In 2004, Jane Costello, a social psychologist at Duke University in Durham, North Carolina, was forced to go to court to defend the con-fidentiality of patient records from the Great Smoky Mountains Study. The study, which is just going into its third decade, examines emo-tional and behavioural problems in a cohort of people who enrolled as adolescents. A par-ticipant in the study was testifying against her grandfather, John Trosper ‘JT’ Bradley, who had been accused of sexual abuse. JT’s lawyers subpoenaed the granddaughter’s records from the study in hope that the information would undermine her credibility as a witness.

It meant a major crisis of confidence for Costello. “I was telling 1,400-plus people every time we saw them that ‘your data are absolutely

safe’, and now I was in a position where I was told, ‘No, that’s not true’,” she says. After Costello’s day in court, in August 2004, the records remained sealed, but mostly because the judge did not believe that they would exon-erate JT. The result provided no clarity about patient protections.

BETTER MODELS FOR CONSENTOne solution is to keep genetic information separate from demographic data. The BioVU databank at Vanderbilt University Medical Center in Nashville, Tennessee, for instance, contains DNA samples from patients treated at the hospital — 143,939 people as of 11 June. The DNA is linked to health records in a sec-ond database, called a ‘synthetic derivative’, in which the data are anonymized and scrambled in ways that, its creators say, make it difficult for anyone to work back from the database to verify a patient’s identity. Sample-collection dates are altered, for example, and some records are discarded at random, so that it is not possible to know that someone is in the database just because he or she was treated at the hospital. Even researchers who work with the data cannot determine whose data they are using. The databank expects to include as many as 200,000 individuals by 2014, making it one of the largest collections of linked genetic and health records in the world. But when it comes to consent, BioVU takes a different approach from many other

programmes. Patients don’t choose to participate; rather they are given the chance to opt out. Patients are asked to sign a ‘consent to treat’ form every year. It includes a box that they can tick to keep their DNA out of the database. That model helps BioVU to collect many more samples, and much more cheaply, than other projects can. The opt-out model — which is used in only a few other places — troubles Misha Angrist, a genome policy analyst at Duke University, who says that it risks taking advantage of people when they are ill. “Even a routine visit to the clinic can be a vulnerable moment, and they’re saying, ‘Would you mind doing this for future generations, to help people just like you?’.”

And legal challenges have shown the weak-nesses of opt-out policies. Health officials are now destroying millions of blood samples taken from newborn babies in Texas and Minnesota because the families were not ade-quately informed that the samples, collected to screen for specific inherited disorders, would also be used in research.

Vanderbilt officials and researchers counter that they have run extensive public campaigns to ensure that people in Nashville are aware of BioVU and are comfortable with the way it works. They regularly consult a commu-nity advisory board about the project. And Vanderbilt’s approach actually goes above and beyond what is required by federal law; because the synthetic derivative includes de-identified

VIK

TOR

KO

EN

2 1 J U N E 2 0 1 2 | V O L 4 8 6 | N A T U R E | 3 1 3

FEATURE NEWS

© 2012 Macmillan Publishers Limited. All rights reserved

90

Page 141: Sharing Clinical Research Data:  an IOM Workshop

data, it doesn’t legally require informed consent at all. Last July, the US Department of Health and Human Services signalled that it might be rethinking the rules that exempt de-identified data from the consent requirement, as part of a broad overhaul of research ethics regulations.

Irrespective of the outcome, obliterating patient identities has drawbacks. Researchers can’t perform some types of research on the scrambled data. Because dates are changed, studies on the timing of influenza infections, for example, are impossible. And patients can’t be told if the research has revealed that they carry individual genetic risks linked to disease.

FULL DISCLOSUREReturning study results to research participants has been another thorny issue for consent. Doc-tors might learn about genetic predispositions to disease that are separate from the ailments that led a patient to participate in the research in the first place, but it is not clear what they should do with this information.

UK researchers, for example, are forbidden from sharing genetic results with participants. But US research societies, such as the Ameri-can College of Medical Genetics and Genom-ics in Bethesda, Maryland, are moving towards adopting standards that would encourage the practice for some types of findings, such as those that are medically relevant.

Some countries, such as Germany, Austria, Switzerland and Spain, are already feeding back such information. And some clinical sequencing programmes are considering offer-ing patients ‘tiered’ consent, in which people can decide whether to be told about their data and how much they want to learn.

This is what Han Brunner, a geneticist at the Radboud University Nijmegen Medical Centre in the Netherlands, had hoped to do.

Last year, he began a project to sequence the exomes — the protein-coding regions of the genome — of 500 children and adults, looking for the genetic causes of intellectual disabilities, blindness, deafness and other disorders. Brun-ner proposed allowing participants to choose from three options: they could learn everything that researchers had divined about disease sus-ceptibility; just information relevant to the dis-ease for which their genomes were examined; or no information at all. Ethics reviewers shot down his proposal. “They said that in practice, it would be impossible for people to draw those lines, because people giving consent cannot foresee all the possible outcomes of the study,” Brunner says. Instead, everyone participating in the studies must agree to learn all medically relevant information arising from the analysis of their genomes. As a consequence, Brunner recently had to tell the family of a child with a developmental disability that the child also has a genetic predisposition to colon cancer. Not all researchers endorse the idea of informing chil-dren about diseases that might affect them as adults. In this case, doctors recommended early screening, and Brunner says, “the family han-dled it very well; they said, ‘This is not what we anticipated, but it’s useful information’”.

Many of the studies done now ask patients to give consent for research linked to particular investigators or diseases. But that means that researchers cannot pool data from separate studies to tackle different research questions. Many researchers say that the obvious solu-tion is a broad consent document that gives researchers free rein with the data. But many non-scientists think participants should be able to control how their data are used, says lawyer Tim Caulfield of the University of Alberta in Calgary, Canada, who has surveyed patients about this idea. “There’s an emerging consensus

within the research community about the need to adopt things like broad consent, but that hasn’t translated out to the legal community or to the public,” he says.

Another solution might be called ‘radical honesty’. A US project called Consent to Research, which aims to provide a large pool of user-contributed genomic and health data, has devised what it calls a ‘Portable Legal Consent’, which allows anyone to upload information about him or herself, such as direct-to-con-sumer genetic results and lab tests ordered through medical providers, to an interface that strips the data of identifiers. It makes the data widely available to researchers under broad guidelines, but also requires data donors to go through a much more rigorous consent pro-cess than most studies do. The Portable Legal Consent specifically informs participants that researchers might be able to determine their identities, but that they are forbidden from doing so under the project’s terms of use.

Such approaches could help scientists by giving them access to a trove of data with no restrictions on use. But the participant pro-tections system that is in place might not be ready for such frank dialogues, says Angrist, who serves on one of Duke’s institutional review boards.

While reviewing a research proposal for a large biobank, for example, Angrist suggested that the researchers send the participants an annual e-mail explaining how their samples were being used, and thanking them for donat-ing their time and tissue. The review board voted this suggestion down after its chair argued that e-mailing the patients would create a prob-lem in light of the Health Insurance Portability and Accountability Act (HIPAA) — the US law that guarantees the privacy of health records. “The irony is that the HIPAA is supposed to protect people, and what I was hearing was, ‘We can’t talk to people because we’re too busy protecting them’,” Angrist says. “Institutions use informed consent to mitigate their own liabil-ity and to tell research participants about all the things they cannot have, and all the ways they can’t be involved. It borders on farcical.”

But as patient data become more precious to researchers, and as advocacy organizations become more involved in driving research agendas and in funding the work, such pater-nalistic attitudes will probably not survive, says Terry. She adds that technologies that allow research participants to control and track how researchers use their data will soon catch on. These approaches could benefit patients, who gain transparency and control over their data, and researchers, who gain access to richer data sets. “I think we’re going to have to get to a place where consenting people becomes customizable easily through technology, and we’re not there yet,” Terry says. ■ SEE EDITORIAL P.293

Erika Check Hayden writes for Nature from San Francisco, California.

READING BETWEEN THE LINESDespite the work that goes into making consent forms clear and detailed, some participants say they are confused by the wording.

5. What are the benefits and risks of participating?

......If 23andMe develops intellectual property and/or commercializes products or services, directly or indirectly, based on the results of this study, you will not receive any compensation.

2. I have been informed that the purpose of the research is to study the causes of behavioral/medical disorders.

Re-identificationWe are quickly learning that with powerful computers and good mathematicians, it is increasingly possible to uniquely identify people inside large data sets ... Researchers will sign a contract in which they agree that, even if they were able to identify you, they won’t do it.

23andMe received a patent in May for its work on Parkinson’s disease. Some participants did not expect it to seek intellectual-property rights.From 23andMe’s research consent form www.23andme.com/about/consent/

Many participants in the Medical Genetics at Havasupai study in Arizona in the 1990s were unaware of how broad the research goals were and what kind of studies would be performed.Courtesy of Pilar Ossorio

In an attempt to be more transparent about privacy risks, the Consent to Research project’s ‘Portable Legal Consent’ essentially states that although anonymity cannot be guaranteed, researchers must pledge to uphold it.Courtesy of John Wilbanks Consent to Research

3 1 4 | N A T U R E | V O L 4 8 6 | 2 1 J U N E 2 0 1 2

FEATURENEWS

© 2012 Macmillan Publishers Limited. All rights reserved

91

Page 142: Sharing Clinical Research Data:  an IOM Workshop

August 25, 2012

Genes Now Tell Doctors Secrets They Can’t Utter By GINA KOLATA

Dr. Arul Chinnaiyan stared at a printout of gene sequences from a man with cancer, a subject in

one of his studies. There, along with the man’s cancer genes, was something unexpected —

genes of the virus that causes AIDS.

It could have been a sign that the man was infected with H.I.V.; the only way to tell was further

testing. But Dr. Chinnaiyan, who leads the Center for Translational Pathology at the University

of Michigan, was not able to suggest that to the patient, who had donated his cells on the

condition that he remain anonymous.

In laboratories around the world, genetic researchers using tools that are ever more

sophisticated to peer into the DNA of cells are increasingly finding things they were not looking

for, including information that could make a big difference to an anonymous donor.

The question of how, when and whether to return genetic results to study subjects or their

families “is one of the thorniest current challenges in clinical research,” said Dr. Francis Collins,

the director of the National Institutes of Health. “We are living in an awkward interval where

our ability to capture the information often exceeds our ability to know what to do with it.”

The federal government is hurrying to develop policy options. It has made the issue a priority,

holding meetings and workshops and spending millions of dollars on research on how to deal

with questions unique to this new genomics era.

The quandaries arise from the conditions that medical research studies typically set out.

Volunteers usually sign forms saying that they agree only to provide tissue samples, and that

they will not be contacted. Only now have some studies started asking the participants whether

Page 1 of 5With Rise of Gene Sequencing, Ethical Puzzles - NYTimes.com

8/27/2012http://www.nytimes.com/2012/08/26/health/research/with-rise-of-gene-sequencing-ethical-...92

Page 143: Sharing Clinical Research Data:  an IOM Workshop

they want to be contacted, but that leads to more questions: What sort of information should

they get? What if the person dies before the study is completed?

The complications are procedural as well as ethical. Often, the research labs that make the

surprise discoveries are not certified to provide clinical information to patients. The consent

forms the patients signed were approved by ethics boards, which would have to approve any

changes to the agreements — if the patients could even be found.

Sometimes the findings indicate that unexpected treatments might help. In a newly published

federal study of 224 gene sequences of colon cancers, for example, researchers found genetic

changes in 5 percent that were the same as changes in breast cancer patients whose prognosis is

drastically improved with a drug, Herceptin. About 15 percent had a particular gene mutation

that is common in melanoma. Once again, there is a drug, approved for melanoma, that might

help. But under the rules of the study, none of the research subjects could ever know.

Other times the findings indicate that the study subjects or their relatives who might have the

same genes are at risk for diseases they had not considered. For example, researchers at the

Mayo Clinic in Rochester, Minn., found genes predisposing patients to melanoma in cells of

people in a pancreatic cancer study — but most of those patients had died, and their consent

forms did not say anything about contacting relatives.

One of the first cases came a decade ago, just as the new age of genetics was beginning. A young

woman with a strong family history of breast and ovarian cancer enrolled in a study trying to

find cancer genes that, when mutated, greatly increase the risk of breast cancer. But the

woman, terrified by her family history, also intended to have her breasts removed

prophylactically.

Her consent form said she would not be contacted by the researchers. Consent forms are

typically written this way because the purpose of such studies is not to provide medical care but

to gain new insights. The researchers are not the patients’ doctors.

But in this case, the researchers happened to know about the woman’s plan, and they also knew

that their study indicated that she did not have her family’s breast cancer gene. They were

horrified.

Page 2 of 5With Rise of Gene Sequencing, Ethical Puzzles - NYTimes.com

8/27/2012http://www.nytimes.com/2012/08/26/health/research/with-rise-of-gene-sequencing-ethical-...93

Page 144: Sharing Clinical Research Data:  an IOM Workshop

“We couldn’t sit back and let this woman have her healthy breasts cut off,” said Barbara B.

Biesecker, the director of the genetic counseling program at the National Human Genome

Research Institute, part of the National Institutes of Health. After consulting the university’s

lawyer and ethics committee, the researchers decided they had to breach the consent

stipulations and offer the results to the young woman and anyone else in her family who

wanted to know if they were likely to have the gene mutation discovered in the study. The entire

family — about a dozen people — wanted to know. One by one, they went into a room to be told

their result.

“It was a heavy and intense experience,” Dr. Biesecker recalled.

Around the same time, Dr. Gail Jarvik, now a professor of medicine and genome science at the

University of Washington, had a similar experience. But her story had a very different ending.

She was an investigator in a study of genes unrelated to breast cancer when the study

researchers noticed that members of one family had a breast cancer gene. But because the

consent form, which was not from the University of Washington, said no results would be

returned, the investigators never told them, arguing that their hands were tied. The researchers

said an ethics board — not they — made the rules.

Dr. Jarvik argued that they should have tried to persuade the ethics board. But, she said, “I did

not hold sway.”

Such ethical quandaries grow more immediate year by year as genome sequencing gets cheaper

and easier. More studies include gene sequencing and look at the entire genome instead of just

one or two genes. Yet while some findings are clear-cut — a gene for colon cancer, for example,

will greatly increase the disease risk in anyone who inherits it — more often the significance of a

genetic change is not so clear. Or, even if it is, there is nothing to be done.

Researchers are divided on what counts as an important finding. Some say it has to suggest

prevention or treatment. Others say it can suggest a clinical trial or an experimental drug. Then

there is the question of what to do if the genetic findings only sometimes lead to bad outcomes

and there is nothing to do to prevent them.

“If you are a Ph.D. in a lab in Oklahoma and think you made a discovery using a sample from 15

Page 3 of 5With Rise of Gene Sequencing, Ethical Puzzles - NYTimes.com

8/27/2012http://www.nytimes.com/2012/08/26/health/research/with-rise-of-gene-sequencing-ethical-...94

Page 145: Sharing Clinical Research Data:  an IOM Workshop

years ago from a subject in California, what exactly are you supposed to do with that?” asked

Dr. Robert C. Green, an associate professor of medicine at Harvard. “Are you supposed to

somehow track the sample back?”

Then there are the consent forms saying that no one would ever contact the subjects.

“If you go back to them and ask them to re-consent, you are telling them something is there,”

Dr. Green said. “There is a certain kind of participant who doesn’t want to know,” he added,

and if a researcher contacts study subjects, “you are kind of invalidating the contract.”

Other questions involve the lab that did the analysis. All labs providing clinical results to

patients must have certification ensuring that they follow practices making it more likely that

their results are accurate and reproducible. But most research labs lack this certification, and

some of the latest genetic tests are so new that there are no certification standards for them.

“I find it really hard to defend the notion that we are not going to give you something back

because it was not done” in a certified lab, “even though we are 99 percent certain it is correct,”

Dr. Jarvik said.

Gloria M. Petersen, a genetic epidemiologist at the Mayo Clinic, and her colleagues ran into a

disclosure problem in a study of genes that predispose people to pancreatic cancer. The 2,000

study patients had signed consents indicating whether they wanted to know about research

findings that might be important to them. But the forms did not ask about sharing findings that

might be important to their families, or about what the researchers should do if they discovered

important information after the patients were dead.

Seventy-three of the study patients, almost all of whom are now dead, had one of three

clinically important mutations. One predisposed them mostly to melanoma but also to

pancreatic cancer. A second predisposed them primarily to breast and ovarian cancer. The

third, a cystic fibrosis gene, can increase the risk of pancreatic cancer and can also be important

in family planning. If a man and a woman each have this gene, they have a one-in-four chance

of having a child with the disease.

When it comes to the family members, “I don’t know what my obligation is,” Dr. Petersen said.

“There is an incredible burden to track down the relatives. Whose information is it, and who

Page 4 of 5With Rise of Gene Sequencing, Ethical Puzzles - NYTimes.com

8/27/2012http://www.nytimes.com/2012/08/26/health/research/with-rise-of-gene-sequencing-ethical-...95

Page 146: Sharing Clinical Research Data:  an IOM Workshop

has a right to that information?”

Dr. Petersen, along with Barbara Koenig, a professor of medical anthropology and bioethics at

the University of California, San Francisco, and Susan M. Wolf, a professor of law, medicine

and public policy at the University of Minnesota, got a federal grant to study the effects of

offering to return the genetic results to the families of those 73 patients. The questions involved

are tricky, Dr. Koenig said. Finding patients and their families can be expensive, and labs do not

have money set aside for it. How would you find them? Even if they were found, whom would

you tell? What if there had been a divorce, or if family members were estranged?

“My gut feeling is that there is a moral obligation to return results,” Dr. Koenig said. “But that

comes at an enormous cost. If you were in a study 20 years ago, where does my obligation end?”

Page 5 of 5With Rise of Gene Sequencing, Ethical Puzzles - NYTimes.com

8/27/2012http://www.nytimes.com/2012/08/26/health/research/with-rise-of-gene-sequencing-ethical-...96

Page 147: Sharing Clinical Research Data:  an IOM Workshop

Empirical Study of Data Sharing by Authors Publishing inPLoS JournalsCaroline J. Savage, Andrew J. Vickers*

Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, New York, New York, United States of America

Abstract

Background: Many journals now require authors share their data with other investigators, either by depositing the data in apublic repository or making it freely available upon request. These policies are explicit, but remain largely untested. Wesought to determine how well authors comply with such policies by requesting data from authors who had published inone of two journals with clear data sharing policies.

Methods and Findings: We requested data from ten investigators who had published in either PLoS Medicine or PLoSClinical Trials. All responses were carefully documented. In the event that we were refused data, we reminded authors of thejournal’s data sharing guidelines. If we did not receive a response to our initial request, a second request was made.Following the ten requests for raw data, three investigators did not respond, four authors responded and refused to sharetheir data, two email addresses were no longer valid, and one author requested further details. A reminder of PLoS’s explicitrequirement that authors share data did not change the reply from the four authors who initially refused. Only one authorsent an original data set.

Conclusions: We received only one of ten raw data sets requested. This suggests that journal policies requiring data sharingdo not lead to authors making their data sets available to independent investigators.

Citation: Savage CJ, Vickers AJ (2009) Empirical Study of Data Sharing by Authors Publishing in PLoS Journals. PLoS ONE 4(9): e7078. doi:10.1371/journal.pone.0007078

Editor: Chris Mavergames, The Cochrane Collaboration, Germany

Received May 8, 2009; Accepted August 5, 2009; Published September 18, 2009

Copyright: � 2009 Savage, Vickers. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permitsunrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: Supported in part by funds from David H. Koch provided through the Prostate Cancer Foundation, the Sidney Kimmel Center for Prostate and UrologicCancers and P50-CA92629 SPORE grant from the National Cancer Institute to Dr. P. T. Scardino. The funders had no role in study design, data collection andanalysis, decision to publish, or preparation of the manuscript.

Competing Interests: The authors have declared that no competing interests exist.

* E-mail: [email protected]

Introduction

Technology has dramatically improved the ways in which data

can be stored, analyzed and disseminated. The Internet facilitates

almost instantaneous access to data and information, and now

plays a key role in many fields of medical research. Researchers in

genomics, for example, rely heavily on publicly available resources

such as GenBank, a repository of annotated DNA sequences.

Many journals now state that submission of original data, such as

microarray results, into appropriate repositories is a requirement

for publication[1].

Following the notable strides of genomics, and other fields such as

the open source movement in software, there has been a surge of

awareness regarding data sharing in the biomedical field. The NIH

recently declared that the sharing of data is essential for translating

research into knowledge and products that improve health[2]. As

such, all investigators seeking more than $500,000 in grant support

per year are now required to include a plan for data sharing[3].

Several journals now require authors to share their raw data sets

as a condition of publication. The Public Library of Science

(PLoS) Journals, a collection of open access journals, specifically

states that open access applies to both the scientific literature and

the supporting data: ‘‘publication is conditional upon the

agreement of authors to make freely available any materials and

information associated with their publication that are reasonably

requested by others for the purpose of academic, non-commercial

research.’’[4] Several other highly regarded journals, such as

Nature and Science, have also established clear data sharing

requirements[5,6].

Unfortunately, the specific mechanisms for data sharing are

often unspecified and the implementation of such policies largely

untested. Very few journals have an explicit statement regarding

how their data sharing policies are enforced, and thus it is unclear

what options are available to investigators who encounter authors

who are unwilling to share. To study how well authors comply

with data sharing requests, we attempted to acquire original data

sets from several researchers who had published in journals with

explicit data sharing policies.

Methods

We chose to make our requests from authors who had published

in PLoS journals due to their exceedingly clear data sharing

policies: all data relevant to publication should be deposited in an

appropriate public repository, if appropriate, and if not, ‘‘data

should be provided as supporting information with the published

paper. If this is not practical, data should be made freely available

upon reasonable request.’’[4]

Ten papers from two PLoS publications, PLoS Medicine (n = 4)

and PLoS Clinical Trials (n = 6). None included raw data as part

PLoS ONE | www.plosone.org 1 September 2009 | Volume 4 | Issue 9 | e707897

Page 148: Sharing Clinical Research Data:  an IOM Workshop

of the supporting information. For all papers, the data requested

would have allowed us to test a specific pre-specified hypothesis

about prediction modeling, a scientific interest of the senior

author. Our requests encompassed a wide range of diseases

(cancer, malaria, HIV and others) and study designs (randomized

controlled trials, case-control and cohort studies). We do not think

it is plausible that our selection of papers amenable to prediction

modeling could represent a selection of authors importantly more

or less likely to share data than a random selection from PLoS

publications. Data requests were emailed by CS to the

corresponding author listed on the manuscript. Requests were

similar in all cases: the stated reason was out of personal interest in

the topic and the need for original data for master’s level

coursework.

With respect to ethical considerations, we pre-specified that any

request must not create undue work for the investigators and that

any data received as part of a successful request would be analyzed

as planned, the results of which would be sent to the original

investigators. In the event that we did obtain data, we would not

guarantee to publish, but would promise a co-authorship on

anything that was published.

If the response to our initial request was ‘‘no’’, the stated reason

was documented. A second request was then made by AV, who

identified himself as an Editorial Board member for PLoS ONE.

The email included an explicit reminder of PLoS’s editorial

policies, and stated clearly that, as the authors had published in a

PLoS journal, they were required to provide free and open access

to data sets used for analysis. The response to the second request

was also documented. If we did not receive a response to our initial

request, a second request was sent after one month.

Results

We emailed initial requests for original data to ten correspond-

ing authors. The responses are summarized in Figure 1. Two

corresponding authors’ email addresses listed on the paper were

no longer valid as they had changed institutions since publication.

After extensive investigation, we were able to track down an

updated email address for one author; however, she was away on

maternity leave and we have been unable to establish contact. An

updated email address for the second author was never found. Of

the remaining eight investigators with functioning email addresses,

four authors replied that sharing their data was not possible, three

authors did not respond, and another asked for further details

regarding our request.

Of the four authors refusing to share their data in response to our

initial request, two authors did not give a reason, one stated he was

currently too busy, and the fourth said that he no longer had

jurisdiction over the data as he had changed institutions. We sent a

follow up email to these four investigators and reminded them of

PLoS’s data sharing policies. From the two authors who initially

refused without providing a reason, one author said that we could

submit a formal and lengthy proposal to the appropriate trialists’ group;

the other apologized that he had not been aware of the journal’s data

sharing policy and, as he was forbidden to pass the data on to third

parties, he would not have published in PLoS had he been aware of this

Figure 1. Summary of responses to the 10 initial requests for raw data.doi:10.1371/journal.pone.0007078.g001

Journal Data Sharing Policies

PLoS ONE | www.plosone.org 2 September 2009 | Volume 4 | Issue 9 | e707898

Page 149: Sharing Clinical Research Data:  an IOM Workshop

requirement. To the investigator who said he was currently too busy,

we responded that we could wait to receive the data; his response was

that it was simply too much work. Our request to the investigator who

had changed institutions was forwarded to the appropriate researchers

still working at the institution. They claimed it would take too long to

organize and annotate the data set and would be too much work.

Three authors simply did not respond to our initial request.

After a follow up email to these authors, one author replied that he

was in favor of sharing data in general, but wished to conduct

more analyses before sharing. We did not receive a reply from the

other two authors.

One author replied to our initial request by asking for more

information about our proposed analysis. After correspondence to

discuss our analytic plan and the particular variables needed, we

received a well annotated dataset within a few hours.

Discussion

We requested raw data from ten corresponding authors and

received only one data set. Although our sample was small, our

results are clear: explicit data sharing policies in journals do not

lead authors to share data. Our initial intention was to see if the

rates of data sharing were any higher in journals with data sharing

policies compared to those without. However, our initial results

were sufficiently clear that a comparison was deemed unnecessary.

We are aware of only one prior study that described the

difficultly of obtaining original data sets from published authors.

Wicherts et al. wrote to more than one hundred authors who had

published in American Psychological Association (APA) journals to

request data for reanalysis[7]. They experienced similar difficulty

in obtaining original data, receiving only one quarter of the

requested data sets. Although our study was considerably smaller,

it is importantly different for three reasons. First, the APA’s

policies are not as explicit as those from PLoS. The APA’s

guidelines state that authors should share their data ‘‘provided that

the confidentiality of the participants can be protected and unless

legal rights concerning proprietary data preclude their release,’’ a

stipulation that may have permitted many authors to avoid sharing

under the guise of patient confidentiality or legal rights[8]. Second,

APA policies don’t provide an incentive for original authors to

share data, as data can only be used ‘‘to verify the substantive

claims through reanalysis.’’ [8] Thus original authors stand only to

lose by having their conclusions challenged. In contrast, we asked

investigators for data not to challenge their original conclusions,

but to test a new hypothesis. Third, the majority of data sets

requested by Wicherts et al. were survey research; we requested

data from medical studies and clinical trials - research that has a

direct and immediate relevance to individuals suffering ill-health.

We acknowledge that there are numerous real and perceived

impediments to sharing raw data. One of us has previously written

extensively on this issue[9]. Concerns about patient privacy are

frequently cited, although data from most studies can easily be

coded in such a way as to ensure subject’s anonymity. There are also

concerns about authorship and future publishing opportunities, and

a natural desire to retain exclusive access to data that may have

taken many years of hard work to collect. We have published some

simple guidelines to protect the rights of investigators to exploit their

data, such as an embargo period between publication and release of

raw data.[9] Other investigators may fear that future researchers

will undermine the original authors’ conclusions, either by

uncovering an error or by employing alternative analytic methods.

We believe that it is in the best interests of science to have a robust

debate on the merits of particular research findings. However, it is

only right and fair that investigators should have a say in the use of

their data, so we have previously recommended[9] that original

investigators should be included as co-authors on any publication

resulting from re-analysis of raw data or, alternatively, be offered the

opportunity to provide a response and commentary.

Some of the authors we contacted claimed that it would take

‘‘too much work’’ to provide us with raw data. This suggests that

researchers do not always develop a clean, well annotated data set

for analyses associated with a particular scientific paper. This

strikes us as a problem in and of itself. We also found that that, as

time passes from the date of publication, authors sometimes lose

access to the original data or switch institutions and without

maintaining their the email address listed on the paper. A simple

solution to both of these problems would be to require authors to

submit de-identified data sets to journals or public repositories at

the time of publication.

Data was requested from articles published in only two journals,

PLoS Medicine and PLoS Clinical Trials, and it is possible that

authors who publish in other journals are more likely to share data

sets. This seems unlikely as open access journals with explicit data

sharing policies are likely to attract authors who support greater

openness to scientific data. Our method of requesting data sets was

intentionally left vague as we were interested as much in the

investigators responses as acquiring the actual data set; perhaps a

more detailed request would have garnered more positive responses.

Again this is unlikely, as no authors claimed that our request was

unreasonable, inappropriate, or lacking in sufficient details, and

further information was provided upon request. A final limitation

was that our sample was small. Yet the results were so striking – only

one in ten authors complied - that we can be fairly confident the true

rate of compliance is far from 100%. Indeed, it would be highly

unlikely to obtain only 1 in 10 data sets if the true rate of data

sharing was even as low as 50% (p = 0.01 from binomial test).

In conclusion, our findings suggest that explicit journal policies

requiring data sharing do not lead to authors making their data

sets available to independent investigators.

Author Contributions

Conceived and designed the experiments: CS AV. Performed the

experiments: CS. Analyzed the data: CS AV. Wrote the paper: CS AV.

References

1. Science Magazine. Authors and Contributors, General Policies. Available from:

http://www.sciencemag.org/help/authors/policies.dtl. Accessed March 4, 2009.2. National Institutes of Health. NIH Data Sharing Policy. Available from: http://

grants.nih.gov/grants/policy/data_sharing/. Accessed March 4, 2009.3. National Institutes of Health. NIH Data Sharing Policy Brochure. Available from:

http://grants.nih.gov/grants/policy/data_sharing/data_sharing_brochure.pdf.

Accessed March 4, 2009.4. PLoS Medicine Editorial and Publishing Policies. 13. Sharing of Materials,

Methods and Data. Available from: http://journals.plos.org/plosmedicine/policies.php#sharing. Accessed March 4, 2009.

5. Nature Journal. Authors & Referees, Editorial Policies, Availability of data &

materials. Available from: http://www.nature.com/authors/editorial_policies/availability.html. Accessed March 4, 2009.

6. Science Magazine. General Information for Authors. Conditions of Acceptance.

Available from: www.sciencemag.org/about/authors/prep/gen_info.dtl#datadepwww.sciencemag.org/about/authors/prep/gen_info.dtl#datadep. Ac-

cessed March 4, 2009.

7. Wicherts JM, Borsboom D, Kats J, Molenaar D (2006) The poor availability of

psychological research data for reanalysis. Am Psychol 61: 726–728.

8. (2001) Ethical standards for the reporting and publishing of scientific

information. Publication manual of the American Psychological Association.

Washington, DC. pp 387-396.

9. Vickers AJ (2006) Whose data set is it anyway? Sharing raw data from

randomized trials. Trials 7: 15.

Journal Data Sharing Policies

PLoS ONE | www.plosone.org 3 September 2009 | Volume 4 | Issue 9 | e707899

Page 150: Sharing Clinical Research Data:  an IOM Workshop

Perspective

Open Clinical Trial Data for All? A View from RegulatorsHans-Georg Eichler1*, Eric Abadie1,2, Alasdair Breckenridge3, Hubert Leufkens1,4, Guido Rasi1

1 European Medicines Agency (EMA), London, United Kingdom, 2 Agence Francaise de Securite Sanitaire des Produits de Sante (AFSSAPS) Saint-Denis, France, 3 Medicines

and Healthcare products Regulatory Agency (MHRA), London, United Kingdom, 4 Medicines Evaluation Board (CBG-MEB), Den Haag, The Netherlands

In this issue of PLoS Medicine, Doshi and

colleagues argue that the full clinical trial

reports of authorized drugs should be

made publicly available to enable inde-

pendent re-analysis of drugs’ benefits and

risks [1]. We offer comments on their call

for openness from a European Union drug

regulatory perspective.

For the purpose of this discussion, we

consider ‘‘clinical study reports’’ to com-

prise not just the protocol, summary

tables, and figures of (mostly) randomized

controlled trials (RCTs), but the full ‘‘raw’’

data set, including data at the patient level

[2]. We limit discussion to data on drugs

for which the regulatory benefit-risk as-

sessment has been completed.

Why Trial Data Should Be Openfor All

First and foremost, we agree with

Doshi et al. that clinical trial data should

not be considered commercial confiden-

tial information; most patients enrolling

in clinical trials do so with an assumption

of contributing to medical knowledge,

and ‘‘non-disclosure of complete trial

results undermines the philanthropy’’

[1].

The potential benefits for public health

of independent (re-)analysis of data are not

disputed and, in an open society, trial

sponsors and regulators do not have a

monopoly on analyzing and assessing drug

trial results. Yet, the different responsibil-

ities of regulators and independent ana-

lysts have to be acknowledged. Regulators,

unlike academicians, are legally obliged to

take timely decisions on the availability of

drugs for patients, even under conditions

of uncertainty.

Going beyond the merits of indepen-

dent meta-analysis, we foresee other,

potentially more important benefits from

public disclosure of raw trial data. For

example, RCT datasets enabled the de-

velopment of predictive models for patient

selection to appropriate treatments [3,4].

Taking this notion a step further, we

envisage machine learning systems that

will allow clinicians to match a patient’s

electronic health record directly to RCT

and observational study data sets for

better, individualized therapeutic decisions

(L. Perez-Breva, personal communica-

tion).

Large, information-rich datasets are

needed to support the computer science

and artificial intelligence research re-

quired to develop and test these applica-

tions. Developing such tools is usually not

a priority for, and often beyond the

capabilities and resources of, even the

largest pharmaceutical companies. These

endeavors might best thrive in an envi-

ronment that invites research from be-

yond the current stakeholders in health

[5]. Making rich datasets available for

research is a means to open health

research.

Why Trial Data Should Not BeOpen for All

There are indeed many good arguments

for unrestricted and easy access to full

RCT data. Yet, simply uploading all trial

data on a website would entail its own

problems.

First among those is the issue of

personal data protection or patient confi-

dentiality, a concept that is very different

from commercial confidentiality. There is a

small risk that personal data could inad-

vertently be publicized. There is also a

small risk that an individual patient could

be identified from an anonymized dataset,

for example, from trials in ultra-rare

diseases. Achieving an adequate standard

of personal data protection is not an

insurmountable obstacle, though, and

proposals for best practice for publishing

raw data are available [2]. However,

implementation is not straightforward,

standards will need to be agreed upon up

front, and data redaction may in a few

cases be resource intensive.

Our second caveat is likely more

contentious. We do not dispute that

financial conflicts of interests (CoIs) may

render analyses and conclusions ‘‘vulner-

able to distortion’’ [1]. However, sur-

rounding the ongoing debate over spon-

sor-independent analyses is an implicit

assumption that ‘‘analysis by independent

Citation: Eichler H-G, Abadie E, Breckenridge A, Leufkens H, Rasi G (2012) Open Clinical Trial Data for All? AView from Regulators. PLoS Med 9(4): e1001202. doi:10.1371/journal.pmed.1001202

Published April 10, 2012

Copyright: � 2012 Eichler et al. This is an open-access article distributed under the terms of the CreativeCommons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium,provided the original author and source are credited.

Funding: No specific funding was received for writing this article.

Competing Interests: All authors work for drug regulatory agencies. The authors have declared that no othercompeting interests exist. The views expressed in this article are the personal views of the authors and may notbe understood or quoted as being made on behalf of or reflecting the position of the regulatory agencies, orthe committees or working parties of the regulatory agencies the authors work for.

Abbreviations: CoI, conflict of interest; RCT, randomized controlled trial

* E-mail: [email protected]

Provenance: Commissioned; externally peer reviewed.

Linked Policy Forum

This Perspective discusses the fol-lowing new Policy Forum publishedin PLoS Medicine:

Doshi P, Jefferson T, Del Mar C(2012) The Imperative to ShareClinical Study Reports: Recommen-dations from the Tamiflu Experi-ence. PLoS Med 9(4): e1001201.doi:10.1371/journal.pmed.1001201

Peter Doshi and colleagues de-scribe their experience trying andfailing to access clinical studyreports from the manufacturer ofTamiflu and challenge industry todefend their current position ofRCT data secrecy.

PLoS Medicine | www.plosmedicine.org 1 April 2012 | Volume 9 | Issue 4 | e1001202100

Page 151: Sharing Clinical Research Data:  an IOM Workshop

groups’’ is somehow free from CoIs. We

beg to differ. Personal advancement in

academia, confirmation of previously de-

fended positions, or simply raising one’s

own visibility within the scientific commu-

nity may be powerful motivators. In a

publish-or-perish environment, would the

finding of an important adverse or favor-

able drug effect at the p,0.05-level be

more helpful to a researcher than not

finding any new effects? Will society

always be guaranteed that a finding that

is reported as ‘‘confirmatory’’ was not the

result of multiple exploratory re-runs of a

dataset? We submit that analyses by

sponsor-independent scientists are not

generated in a CoI-free zone and, more

often than not, ego trumps money.

Independent analyses may therefore also

be ‘‘vulnerable to distortion’’. We are

concerned that unrestricted availability of

full datasets may in some cases facilitate

the publication of papers containing mis-

leading results, which in turn lead to

urgent calls for regulatory action. In a

worst case, this would give rise to un-

founded health scares with negative public

health consequences such as patients

refusing vaccinations or discontinuing

drug treatment [6,7].

Aside from CoIs, independent analysis

per se is no guarantee of high quality. The

regulatory community has been confront-

ed with meta-analyses that were later

contradicted by additional evidence [8]

or found to be flawed [9]. We argue that

independent analyses warrant a similar

level of scrutiny as sponsor-conducted

analyses do.

Finally, re-analysis of trial data could be

misused for competitive purposes.

The Way Forward?

We consider it neither desirable nor

realistic to maintain the status quo of

limited availability of regulatory trials

data. What is needed is a three-pronged

approach:

1. Develop and agree upon adequate

standards for protection of personal

data when publicizing RCT datasets.

Most stakeholders will likely agree that

adequate standards of data protection

are a sine qua non, so the issue should be

primarily of a technical and legal

nature. We emphasize adequate stan-

dards because excessive demands and

unrealistically high standards may in

effect become an ‘‘anti-commons’’ and

frustrate important public health gains.

2. Ensure general adoption of established

quality standards of meta-analyses and

other types of (confirmatory) data re-

analysis that may warrant regulatory

action.

3. Establish rules of engagement: In the

area of observational studies based on

health care databases, the European

Network of Centres for Pharmacoepi-

demiology and Pharmacovigilance

(ENCePP) has recently published guid-

ance for raw data sharing; these rules

of engagement follow the principle of

maximum transparency whilst respect-

ing the need to guarantee data privacy

and to avert the potential for misuse

[10]. Others have come up with

broadly similar proposals [11]. Con-

ceivably, analogous principles (e.g.,

data sharing only after receipt of a full

analysis plan) could be applied to

regulatory RCT data [1].

Moreover, we take it as self-evident that

the same standard of openness should

apply to all (drug) trial data, whether

sponsored by industry, investigator-initiat-

ed, or sponsored by public grant-giving

bodies. Likewise, the same standard of

third party scrutiny should be applicable

to all secondary data analyses. Regulatory

inspections of data and analyses carried

out by commercial sponsors are routine.

Would all sponsor-independent research-

ers allow the same level of inspections

applied to their analyses?

We welcome debate on these issues, and

remain confident that satisfactory solutions

can be found to make complete trial data

available in a way that will be in the best

interest of public health.

Author Contributions

Wrote the first draft of the manuscript: HGE.

Contributed to the writing of the manuscript:

HGE EA AB HL GR. ICMJE criteria for

authorship read and met: HGE EA AB HL GR.

Agree with manuscript results and conclusions:

HGE EA AB HL GR.

References

1. Doshi P, Jefferson T, Del Mar C (2012) The

imperative to share clinical study reports. PLoSMed 9.

2. Hrynaszkiewicz I, Altman DG (2009) Towardsagreement on best practice for publishing raw

clinical trial data. Trials 10: 17. Available: http://

www.trialsjournal.com/content/10/1/17. Ac-cessed 9 March 2012.

3. Selker HP, Ruthazer R, Terrin N, Griffith JL,Concannon T, et al. (2011) Random treatment

assignment using mathematical equipoise for

comparative effectiveness trials. Clin Transl Sci4(1): 10–16.

4. Kent DM, Hayward RA, Griffith JL, Vijan S,Beshansky JR, et al. (2002) An independently

derived and validated predictive model for

selecting patients with myocardial infarction

who are likely to benefit from tissue plasminogenactivator compared with streptokinase. Am J Med

113(2): 104–111.5. Nielsen M (2012) Reinventing discovery: the new

era of networked science. Princeton and Oxford:

Princeton University Press.6. Poland GA, Jacobson RM (2011) The age-old

struggle against the antivaccinationists. N Engl JMed 364: 97–99.

7. Lofstedt R, Bouder F, Chakraborty S (2012)

Transparency and the FDA – a quantitativestudy. J Health CommunicationsIn press.

8. Michele TM, Pinheiro S, Iyasu S (2010) Thesafety of tiotropium — the FDA’s conclusions.

N Engl J Med 363: 1097–1099.

9. European Medicines Agency (2011) Questions

and answers on the review of angiotensin IIreceptor antagonists and the risk of cancer.

Available: http://www.ema.europa.eu/docs/en_GB/document_library/Medicine_QA/2011/10/

WC500116862.pdf. Accessed 9 March 2012.

10. European Network of Centres for Pharmacoepi-demiology and Pharmacovigilance (2011) The

ENCePP Code of Conduct for Scientific Inde-pendence and Transparency in the Conduct of

Pharmacoepidemiological and Pharmacovigi-

lance Studies. Available: http://www.encepp.eu/code_of_conduct/documents/CodeofCondu

ct_Rev2.pdf. Accessed 9 March 2012.11. Young SS, Karr A (2011) Deming, data and

observational studies. Significance 8(3): 116–120.

PLoS Medicine | www.plosmedicine.org 2 April 2012 | Volume 9 | Issue 4 | e1001202101

Page 152: Sharing Clinical Research Data:  an IOM Workshop

Reproducible Research: Moving toward Research the Public CanReally TrustChristine Laine, MD, MPH; Steven N. Goodman, MD, PhD, MHS; Michael E. Griswold, PhD; and Harold C. Sox, MD

A community of scientists arrives at the truth by independentlyverifying new observations. In this time-honored process, journalsserve 2 principal functions: evaluative and editorial. In their evalu-ative function, they winnow out research that is unlikely to standup to independent verification; this task is accomplished by peerreview. In their editorial function, they try to ensure transparent (bywhich we mean clear, complete, and unambiguous) and objectivedescriptions of the research. Both the evaluative and editorial func-tions go largely unnoticed by the public—the former only draws

public attention when a journal publishes fraudulent research. How-ever, both play a critical role in the progress of science. This paperis about both functions. We describe the evaluative processes weuse and announce a new policy to help the scientific communityevaluate, and build upon, the research findings that we publish.

Ann Intern Med. 2007;146:450-453. www.annals.orgFor author affiliations, see end of text.

The context for our paper is every journal’s nightmare:publishing fraudulent research. Cases of scientific mis-

conduct have surfaced with alarming frequency, tarnishingpublic trust in journals, researchers, and science itself. Tomuch fanfare, the journal Science published promising de-velopments in stem cell research only to learn months laterthat the findings were bogus (1). Lancet editors retracted apaper on oral cancer when the editors learned that theresearchers had fabricated their data (2). The New EnglandJournal of Medicine published a statement of concern on adrug toxicity study when evidence suggested that the au-thors had withheld important adverse outcomes (3). Inci-dents of authors who have not disclosed potential conflictsof interest have prompted JAMA to reinforce its disclosurepolicies (4–6). Here at Annals, we retracted an article afterlearning that the lead author had fabricated the data. (7–9).In the midst of these accumulating scandals, a New YorkTimes article declared, “Reporters Find Science JournalsHarder to Trust, but not Easy to Verify” (10). Reportersare not alone; journal editors also find biomedical researchharder to trust and not easy to verify.

Fraud gets headlines and drives editors to despair butfortunately occurs rarely. We at Annals spend much of ourtime evaluating the validity and generalizability of everyarticle that we consider for publication. Like our colleaguesat other biomedical journals, we strive to assure ourselvesand our readers that the information we publish is accu-rate. Yet, because we do not personally witness every stepof the research process, we can never be entirely certain ofvalidity. We are less concerned with outright fraud, whichcan elude even the most astute reviewers and editors, thanwith the subtler but much more prevalent biases that arepresent in one form or another in most clinical research.Prompted by both the recent wave of scientific misconductand our own experience in trying to minimize the im-pact of bias in our pages, we review here how we protectagainst publishing research that is misleading or out-right wrong.

EXISTING STRATEGIES TO GUARD AGAINST

RESEARCH IMPROPRIETY

Annals relies on a variety of processes to evaluate thevalidity of work that we consider for publication (Table).These processes span the life of the article from submissionthrough postpublication.

To facilitate the detection of potential sources of biasor threats to validity, Annals requires authors to includeseveral pieces of information with submitted manuscripts.They must disclose the contributions and potential con-flicts of interest of all authors on the byline, including butnot limited to any financial relationship that involves dis-eases, tests, or treatments discussed in the manuscript orcompeting products. This alerts editors, reviewers, andreaders to situations in which competing interests mightbias content or threaten validity (11). Authors know thatthis information will be public, which we hope increasestheir sense of public accountability for unbiased reporting.

We also require authors of clinical trials to register thetrials (12) and submit copies of the trial protocol. Theserequirements give editors access to the detailed, prespeci-fied research plan, which can be useful if the manuscriptlacks key details or when we need to reconcile what theresearchers planned with what they reported.

Annals’ next layer of defense against publishing erro-neous information is a multilayered review that includesscrutiny by both journal staff and external reviewers. Weobtain external review on about 50% of submitted manu-scripts, turning to the more than 12 000 individuals in ourreviewer database to augment the range of skills andknowledge of our editors. Although the peer-review processis not perfect, these reviews provide important information

See also:

Web-OnlyConversion of table into slide

Annals of Internal Medicine Academia and Clinic

450 © 2007 American College of Physicians 102

Page 153: Sharing Clinical Research Data:  an IOM Workshop

that we weigh during the decision-making process; review-ers influence our decisions on a daily basis by identifyingproblems that we missed. Many authors believe (incor-rectly) that reviewers’ opinions always dictate the editorialdecision. Sometimes, the external reviewer’s opinion is de-cisive; however, usually the reviewer’s comments are incor-porated into the insights and judgments of our own edito-rial team, which includes scientists and clinicians with arange of methodological and clinical skills. We then makea preliminary decision at an editorial conference, duringwhich we discuss all papers under consideration by theassociate and deputy editors. After the preliminary deci-sion, the journal’s processes are a mix of evaluative andeditorial functions, with the editorial function taking ongreater importance as we near final acceptance of themanuscript.

We never rely solely on the statistical expertise of theauthors. Any papers selected for possible publication mustpass through yet another filter, a “methods” conference inwhich Annals’ 6 statistical editors, 4 deputy editors, andeditor-in-chief scrutinize a study’s methodology. The stat-istician assigned to the manuscript presents the main sta-tistical issues for discussion at this conference. Often, weask for statistical code or data to clarify or validate meth-ods. We may ask authors to provide supplementary analy-ses to assess the sensitivity of results to different assump-tions or methods, work with authors to improve analyses,or on occasion, perform our own analyses of the data withthe authors’ cooperation. Usually, the revised paper reflectsthese clarifications. These discussions, or subsequent in-quiries, occasionally diminish our confidence in the workto the point that we choose not to publish it. More often,they improve the clarity and completeness of the manu-script, making the strengths and weaknesses of the studymore evident to the reader and ensuring that the tone of

the conclusions matches the strength of the evidence.These outcomes of editing and peer review have been doc-umented in empirical studies at Annals (13) and, morerecently, in a preliminary report of a study that includedseveral journals (14).

Arguably, the most important function of prepublica-tion peer review and editing is to make effective postpub-lication peer review possible. The most stringent internalprocesses cannot guarantee truth in reporting or reliablydetect fraud; however, they can provide the scientific com-munity with the tools to assess the strength and validity ofpublished evidence. We strive to publish methods sectionsthat provide sufficient detail so that readers can assess allthe potential threats to validity. Reporting standards likeCONSORT for randomized, controlled trials (15, 16) alsofacilitate thorough reporting of study methodology. Whenthe methods are transparent, fraud, bias, and weak researchmethods are easier to spot. In this age of instant electronicdissemination and reader feedback, features like rapid re-sponse letters to the editors provide a medium for postpub-lication peer review and discussion of the research.

REPRODUCIBLE RESEARCH: A NEW STRATEGY TO

INCREASE CONFIDENCE IN PUBLISHED RESEARCH

Even with these safeguards, experience teaches us thatwe must seek more innovative strategies to promote truthin publication. Independent replication by independentscientists in independent settings provides the best assur-ance that a scientific finding is valid; however, the resourcesand time required for high-quality clinical studies makesliteral replication of published studies a slow corrective toany errors in the original publication. However, scientistsand journal editors can promote “reproducible research,”described recently by Peng and colleagues (17). The term

Table. Existing and New Strategies That Annals Uses to Guard against the Publication of Biased or Invalid Research

Strategy Purpose

PrepublicationConflict disclosures Alerts editors and reviewers to potential sources of biasPeer review Provides an opportunity for content experts to examine the work for potential threats to validityProtocol submission Enables editors to reconcile what researchers planned with what they reportStatistical review by an in-house team (may include

requests for data and alternative analyses)Enables independent statisticians to critique analysis and look for threats to validity

Trial registration Permits public access to research plans; discourages suppression of unfavorable results

PublicationPublication of conflicts of interest Alerts readers to potential sources of biasPublication of author contributions Establishes accountability for specific components of the research processPublication of detailed methods Enables readers to critique methods and look for threats to validity

PostpublicationLetters to the editor/rapid responses Provides a venue for readers to air their concerns

New safeguardsPublication of information about availability of

protocol, statistical code, and dataIncreases potential for reproducibility, allows greater scrutiny for potential threats to validityPermits confirmation of results by independent individuals

Academia and ClinicMoving toward Research the Public Can Trust

www.annals.org 20 March 2007 Annals of Internal Medicine Volume 146 • Number 6 451103

Page 154: Sharing Clinical Research Data:  an IOM Workshop

has its roots in highly technical forms of statistical pro-gramming and data presentation, but its requirement forsharing data and analytic code is applicable to clinical re-search. Reproducibility involves methods to ensure thatindependent scientists can reproduce published results byusing the same procedures and data as the original investi-gators. It also requires that the primary investigators sharetheir data and methodological details. These include, at aminimum, the original protocol, the dataset used for theanalysis, and the computer code used to produce the results.

Reproducibility requires a level of openness that is rarein the current competitive environment of biomedical re-search. Investigators might be understandably concernedthat sharing data would make them vulnerable to un-wanted scrutiny or enable other researchers to make dis-coveries using the data that took them years to assemble.Recognizing this, Peng and colleagues describe forms ofdata sharing that are between unlimited access and no ac-cess (17). Their proposals, adapted in part from commer-cial models of intellectual property protection, involve lim-itations on the permitted use of the data set, as set forth ina license or written agreement between an investigator anda requestor. At one extreme, researchers could only haveaccess to portions of the analytic data set and solely for thepurpose of reproducing published results; at the other ex-treme, researchers could have unrestricted access to the en-tire file of raw data. Programs incorporated into commer-cial software that create technical barriers to unauthorizeduse or widespread dissemination could also protect authors’interests.

We believe that the reproducible research model isimportant. By strengthening the scientific process, it canalso protect the public against any long-term adverse effectsof research gone wrong. For this reason, Annals is launch-ing a “reproducible research” initiative. Every original re-search article will include a statement that indicateswhether the study protocol, data, or statistical code is avail-able to readers and under what terms authors will share thisinformation. Sharing will not be mandatory, but we willrequire authors to state whether they are willing to sharethe protocol, data, or statistical code. This new policy willapply to all articles containing original research (articlesand brief communications) submitted on or after 1 April2007. We hope that willingness to share these elementswill promote trust in the authors’ findings, similar to whathas happened with our publishing of conflict disclosures,funding sources, and author contributions.

Whether authors make these materials available to thepublic will not influence Annals’ editorial decisions. Toassure authors of our good faith, we will not inquire aboutan author’s willingness to make this information availableuntil after we have accepted the manuscript for publica-tion. Per our current practice, we will continue to ask au-thors to make these items available to the editors duringthe review process whenever necessary. We emphasize thatthe main purpose of this new initiative is to promote the

scientific process, not to deter fraud. A recently publishedexample at Annals documents how the sharing of data re-sulted in valuable insights about the findings of a highlyvisible study published several years ago (18).

The inclusion of statistical code in this initiative mayalso serve to remind investigators to keep an exact record ofthe statistical procedures performed and the portions of thedata set that the authors analyzed with statistical software.Most statistical software packages have some method forstoring the analysis code, which facilitates examination bystatistical editors or other researchers and modification ofthe analysis for subsequent projects. Universal data archiv-ing is an essential basic component of research accountabil-ity that researchers often neglect. We urge all authors, eventhose who are reluctant to share their data under anyterms, to archive their data. Even those who never intendto share their data may need to address new findings orconcerns that surface long after publication. Too often,researchers say that they are unable to respond to postpub-lication inquiries because the data are no longer available.This response weakens confidence in the original work,serves the needs of medical science poorly, and betraysthe trust of the patients who agreed to participate in theresearch.

Annals is not taking this initiative in a vacuum. Otherdevelopments show that scientists recognize the value oftransparency and reproducibility. Our colleagues at JAMAhave established a policy requiring an independent aca-demic statistician to corroborate analyses of industry-funded studies (19). In their recommendations to the edi-tors of Science, a committee that the journal assembled toexamine the process involved in the publication of thefraudulent stem cell papers by Hwang and colleaguesstates, “Primary data are essential and should be availableto reviewers and readers. The General Information for au-thors should be modified to make it clear that, for example,requests for materials, methods, or data necessary to verifythe conclusions may be required prior to acceptance (20).”In the basic sciences, the journal Nature requires protocoland data sharing (21), and most journals publishing bio-informatics information (for example, genetic sequences)require authors to deposit their data in public databankswhen possible (22). In addition, the National Institutes ofHealth (NIH) requires data sharing plans for all largeprojects (23), and the Shelby amendment requires sharinghealth research data used as the basis for law or policy.Some NIH institutes have established their own data shar-ing policies (24). The recognition that reproducibility ofresearch can be a defense against bias and scientific mis-conduct and a check on the robustness of conclusions isclearly growing among the institutions on which the publicrelies to conduct and disseminate medical research.

Major cultural shifts in research must occur before aworld of completely reproducible research can exist. Theseshifts include increasing the technical capacity of manyresearch teams, further developing acceptable data sharing

Academia and Clinic Moving toward Research the Public Can Trust

452 20 March 2007 Annals of Internal Medicine Volume 146 • Number 6 www.annals.org104

Page 155: Sharing Clinical Research Data:  an IOM Workshop

mechanisms, and supporting—both professionally and fi-nancially—the publishing of reproducible research. We be-lieve that small demonstrations of the power of open shar-ing of methodological details and data may motivate suchshifts in the clinical research culture. These changes aremost likely to occur if other major biomedical journals joinus in this effort. We hope that shining a spotlight on theavailability of the study protocol, data, and statistical codefor every Annals research report will be seen as a small butimportant step toward biomedical research that the publiccan really trust. At the same time, it will enhance what isperhaps the main function of a journal: to provide a trans-parent medium for a conversation about science.

Acknowledgments: The authors thank Drs. Michael Berkwits, Paul Ep-stein, and Cynthia D. Mulrow for their review of earlier drafts of themanuscript.

Potential Financial Conflicts of Interest: None disclosed.

Requests for Single Reprints: Customer Service, American College ofPhysicians, 190 N. Independence Mall West, Philadelphia, PA 19106.

Current author addresses are available at www.annals.org.

References1. Kennedy D. Editorial expression of concern [Letter]. Science. 2006;311:36.[PMID: 16373531]2. Horton R. Retraction—Non-steroidal anti-inflammatory drugs and the riskof oral cancer: a nested case-control study. Lancet. 2006;367:382. [PMID:16458751]3. Curfman GD, Morrissey S, Drazen JM. Expression of concern reaffirmed[Editorial]. N Engl J Med. 2006;354:1193. [PMID: 16495386]4. Kurth T, Gaziano JM, Cook NR, Logroscino G, Diener HC, Buring JE.Unreported financial disclosures in a study of migraine and cardiovascular disease[Letter]. JAMA. 2006;296:653-4. [PMID: 16896106]5. DeAngelis CD. Unreported financial disclosures in a study of migraine andcardiovascular disease [Letter; author reply]. JAMA. 2006;296:654. [PMID:16896106]6. Flanagin A, Fontanarosa PB, DeAngelis CD. Update on JAMA’s conflict ofinterest policy [Editorial]. JAMA. 2006;296:220-1. [PMID: 16835430]7. Garrel DR, Rabasa-Lhoret R, Boyle P. Preventing scientific fraud [Letter].

Ann Intern Med. 2006;145:473; author reply 473-4. [PMID: 16983141]8. Poehlman ET. Notice of retraction: final resolution [Letter]. Ann Intern Med.2005;142:798. [PMID: 15845951]9. Sox HC. Notice of retraction: final resolution [Letter]. Ann Intern Med.2005;142:798. [PMID: 15845952]10. Bosman J. Reporters find science journals harder to trust, but not easy toverify. New York Times. 13 February 2006.11. Davidoff F, DeAngelis CD, Drazen JM, Hoey J, Højgaard L, Horton R, etal. Sponsorship, authorship, and accountability [Editorial]. Ann Intern Med.2001;135:463-6. [PMID: 11560460]12. De Angelis CD, Drazen JM, Frizelle FA, Haug C, Hoey J, Horton R, et al.Is this clinical trial fully registered? A statement from the International Commit-tee of Medical Journal Editors [Editorial]. Ann Intern Med. 2005;143:146-8.[PMID: 16027458]13. Goodman SN, Berlin J, Fletcher SW, Fletcher RH. Manuscript qualitybefore and after peer review and editing at Annals of Internal Medicine. AnnIntern Med. 1994;121:11-21. [PMID: 8198342]14. Lee KP, Boyd EA, Bero LA. Editorial changes to manuscripts published inmajor biomedical journals. Accessed at www.ama-assn.org/public/peer/abstracts.html#peer on 25 January 2007.15. CONSORT GROUP (Consolidated Standards of Reporting Trials). TheCONSORT statement: revised recommendations for improving the quality ofreports of parallel-group randomized trials. Ann Intern Med. 2001;134:657-62.[PMID: 11304106]16. CONSORT GROUP (Consolidated Standards of Reporting Trials). Therevised CONSORT statement for reporting randomized trials: explanation andelaboration. Ann Intern Med. 2001;134:663-94. [PMID: 11304107]17. Peng RD, Dominici F, Zeger SL. Reproducible epidemiologic research. AmJ Epidemiol. 2006;163:783-9. [PMID: 16510544]18. Sylvestre MP, Huszti E, Hanley JA. Do OSCAR winners live longer thanless successful peers? A reanalysis of the evidence. Ann Intern Med. 2006;145:361-3; discussion 392. [PMID: 16954361]19. Fontanarosa PB, Flanagin A, DeAngelis CD. Reporting conflicts of interest,financial aspects of research, and role of sponsors in funded studies [Editorial].JAMA. 2005;294:110-1. [PMID: 15998899]20. Kennedy D. Responding to fraud [Editorial]. Science. 2006;314:1353.[PMID: 17138870]21. Nature editorial policies. London and New York: Nature. Accessed at www.nature.com/nature/authors/editorial_policies/index.html on 25 January 2007.22. Journal of the National Cancer Institute instructions to authors. Bethesda,MD: Journal of the National Cancer Institute. Accessed at www.oxfordjournals.org/our_journals/jnci/for_authors/ on 25 January 2007.23. NIH data sharing policy. Bethesda, MD: National Institutes of Health Of-fice of Extramural Research. Accessed at http://grants.nih.gov/grants/policy/data_sharing/ on 25 January 2007.24. Nationial Cancer Institute data sharing policy. Bethesda, MD: NationalCancer Institute Accessed at ctep.cancer.gov/guidelines/data_sharing_policy.pdfon 25 January 2007.

Academia and ClinicMoving toward Research the Public Can Trust

www.annals.org 20 March 2007 Annals of Internal Medicine Volume 146 • Number 6 453105

Page 156: Sharing Clinical Research Data:  an IOM Workshop

Current Author Addresses: Drs. Laine and Sox: Customer Service,American College of Physicians, 190 N. Independence Mall West, Phil-adelphia, PA 19106.Dr. Goodman: Department of Oncology, Division of Biostatistics, JohnsHopkins University, 550 North Broadway, Suite 1103, Baltimore, MD21209.

Dr. Griswold: Welch Center, Johns Hopkins University, 2024 E. Mon-ument Street, Baltimore, MD 21205.

Annals of Internal Medicine

W-118 20 March 2007 Annals of Internal Medicine Volume 146 • Number 6 www.annals.org106

Page 157: Sharing Clinical Research Data:  an IOM Workshop

By Michael R. Law, Yuko Kawasumi, and Steven G. Morgan

Despite Law, Fewer Than One InEight Completed Studies Of DrugsAnd Biologics Are Reported OnTime On ClinicalTrials.gov

ABSTRACT Clinical trial registries are public databases created toprospectively document the methods and measures of prescription drugstudies and retrospectively collect a summary of results. In 2007 the USgovernment began requiring that researchers register certain studies andreport the results on ClinicalTrials.gov, a public database of federally andprivately supported trials conducted in the United States and abroad. Wefound that although the mandate briefly increased trial registrations,39 percent of trials were still registered late after the mandate’s deadline,and only 12 percent of completed studies reported results within a year,as required by the mandate. This result is important because there isevidence of selective reporting even among registered trials. Furthermore,we found that trials funded by industry were more than three times aslikely to report results than were trials funded by the National Institutesof Health. Thus, additional enforcement may be required to ensuredisclosure of all trial results, leading to a better understanding of drugsafety and efficacy. Congress should also reconsider the three-year delay inreporting results for products that have been approved by the Food andDrug Administration and are in use by patients.

Clinical trials of prescription drugsare an important foundation forthe evidence-based regulation, cov-erage, and prescribing of prescrip-tion drugs. However, the selective

reporting of results by sponsors and some jour-nals’ publication bias toward positive results si-multaneously limit and skew the informationthat trials provide.1

Clinical trial registries—public repositories forprospectively documenting study characteristicsand retrospectively reporting results—have beencreated to help address such problems.2 Theregistries can improve thebalance and complete-ness of information for clinical and policy deci-sionmaking, provided thatparticipation in themis not subject to the same biases as the publica-tion of trials in journals. One way to achieve this

is to mandate and enforce registration and re-porting of results.Transparency about trial design and results

has clear benefits. Published studies, for exam-ple, often report different primary outcomesthan those stated in the original protocols.3 Trialregistrationandresults reporting ensure that theoriginal study outcomes can be verified and thatall outcomeand safety data are released. Further-more, many studies remain unpublished,4 eventhough data from them have the potential toelucidate drug efficacy and safety problems.2

For example, a meta-analysis that examined un-publishedefficacydataonantidepressants foundthat published studies substantially overstatedefficacy.5

A complete registry of trials and results wouldmake unpublished studies available in a public

doi: 10.1377/hlthaff.2011.0172HEALTH AFFAIRS 30,NO. 12 (2011): 2338–2345©2011 Project HOPE—The People-to-People HealthFoundation, Inc.

Michael R. Law ([email protected]) is an assistantprofessor in the Centre forHealth Services and PolicyResearch at the University ofBritish Columbia, inVancouver.

Yuko Kawasumi is a researchassociate in the Departmentof Anesthesiology,Pharmacology, andTherapeutics at the Universityof British Columbia.

Steven G. Morgan is anassociate professor andassociate director of theCentre for Health Servicesand Policy Research at theUniversity of British Columbia.

2338 Health Affairs December 2011 30: 12

Pharmaceuticals

at GEORGE E BROWN JR LIBRARY on December 13, 2011Health Affairs by content.healthaffairs.orgDownloaded from 107

Page 158: Sharing Clinical Research Data:  an IOM Workshop

forum, which would benefit patients, clinicians,andpolicymakers by providing valuable insightsinto the safety and efficacy of medicines. Cur-rently the largest trial registry is ClinicalTrials.gov, which is run by the US National Institutesof Health (NIH) and contains information onmore than 100,000 different trials.The database was originally made public in

February 2000 to improve patient access to in-formation on clinical trials for rare and seriousconditions. Following its opening, two entities—the International Committee of Medical JournalEditors and the federal government—enactedpolicies mandating that drug trials be registeredwith a public database such as ClinicalTrials.gov.These policies affected Phase II (efficacy trials),Phase III (randomized controlled trials of adrug’s effectiveness versus that of a compara-tor), and Phase IV trials (postmarketing studiesdone on approved drugs)—but not Phase I trials(safety and tolerability studies in healthy vol-unteers).First, in July 2005 the International Commit-

tee of Medical Journal Editors made early regis-tration—that is, registration prior to patient en-rollment—a prerequisite for publication for allstudies started after that date.2 Ongoing trialshad to be registered by September 13, 2005. Re-search has documented an increase in rates ofregistrationwithClinicalTrials.gov following theenactment of this mandate.6,7

And second, the September 2007 Food andDrug Administration Amendments Act man-dated that either the study sponsor or the prin-cipal investigator registerwithClinicalTrials.govall Phase II and higher drug and biologic trialsthat meet either or both of two criteria: the trialhas one ormore study sites located in the UnitedStates; and the trial is being conducted under aUS investigational new drug application.The act also mandated that trials that were

already in progress be registered by Decem-ber 2007 on ClinicalTrials.gov.8 And it requiredthat researchers post a summary of basic re-sults—including outcome measures and adverseevents—within a year of the completion of datacollection or within thirty days after the Foodand Drug Administration first approvedthe drug.8

Failure to comply can result in civil penalties ofup to $10,000 per day if the violations remainuncorrected thirty days after being cited by theFood andDrugAdministration.However, the actpermits researchers todelay reporting results forup to three years for trials conducted beforedrugs are initially approved, and for studies ofunapproved clinical indications for currently ap-proved drugs.9

The overall value of clinical trial registries for

the clinical and scientific communities dependson how comprehensively the registries docu-ment completed and ongoing studies, particu-larly those that will never be published. How-ever, researchers have yet to quantify theimpact of the US mandate on the number ofstudy registrations or the timeliness of registra-tions and results reporting.We studied the longer-term impact of the

federal mandate on the registration and report-ing of results for drug and biological trials inClinicalTrials.gov. We found that the mandateincreased the number of trials registered withClinicalTrials.gov. Butwe also found that despitethe mandates, 39 percent of trials were regis-tered late, andonly 12percent of completed stud-ies reported results within one year.

Study Data And MethodsData To analyze the impact of the federal man-date on trial registration and results reporting,we used data on studies that were registeredwithClinicalTrials.gov from September 1999 toMay 2010. From ClinicalTrials.gov, we down-loaded the complete study registration recordsas of July 2, 2010, in XML format and importedthese data into the statistical analysis softwareSAS, version 9.1, using the SASXMLMapper.Wealso determined whether and when results hadbeen posted, from HTML copies of the studyrecords. After compilation, we checked to seeif the overall number of studies imported andthe overall number of studies reporting resultsmatched the numbers reported on Clinical-Trials.gov.From the electronic records, we compiled the

registration date, start month, primary comple-tion month, overall status (for example, com-pleted or active), results reporting date, studyphase, number of patients studied, trial fundingsource, and type of intervention being studied(drug or biologic).We studied Phase II or higher drug or biologic

trials because these trials were subject to bothmandates. Trials listing more than one trialphase were assigned to the higher categorylisted. We categorized trial registration timingas “early” and “late” using two variables: theregistration date and the trial’s start month. Be-cause the federal mandate required registrationwithin twenty-onedaysofpatient enrollment,weclassified studies as “early” if they were first re-ceived by ClinicalTrials.gov within the firsttwenty-one days of the month following theirreported start month. We classified studies as“late” if they were received after that deadline.Analysis To investigate the impact of the

federal mandate on trial registrations, we used

December 2011 30: 12 Health Affairs 2339

at GEORGE E BROWN JR LIBRARY on December 13, 2011Health Affairs by content.healthaffairs.orgDownloaded from 108

Page 159: Sharing Clinical Research Data:  an IOM Workshop

an interrupted time-series analysis to determinethe number of trials registered monthly withClinicalTrials.gov and the numbers that wereregistered early.10 We excluded short-term“spikes” associated with the enactment of eachmandate (and of the registry launch) by not in-cluding registrations during the two months be-fore and after those events.We used generalized-least-squares models to

assess changes in both the level and the trend inthe number of trials registered between threetime periods: “pre–journal mandate,” from De-cember 1999 to April 2005; “pre–federal man-date,” from December 2005 to September 2007;and “post–federal mandate,” from March 2008to May 2010.We tested the models for the pres-ence of significant autocorrelation at one to fourmonths and at one year, but we found none.10

To examine the rate of results reporting, weused statistical models of “time to reporting” forall trials that reported an actual—as opposed

to an anticipated—completion between Octo-ber 2008 andMay 2010.We chose October 2008because the NIH launched the ClinicalTrials.govsystem for reporting results in September 2008.For each completed trial, we studied the length

of timebetween theendof the completionmonthand the date that results were reported. Tomodelthe factors associated with reporting results, weused a Cox proportional hazards model—a sta-tistical model that investigated the relationshipbetween the characteristics of trials and the timethey took to report results—and included fourgroups of covariates. The four groups were asfollows: the study phase (III and IV, versus II);enrollment (101–500 patients and more than500patients, versusup to 100patients);whetheror not at least one listed study locationwas in theUnited States; and indicators for different fund-ing sources (industry, clinical research network,US federal agency except for the NIH, and other,versus the NIH).Completed studies that had not reported re-

sults within our study period (n ¼ 4; 118) wereincluded in our analysis, and their follow-uptime was limited to the end of the study period(May 31, 2010), at which point they were con-sidered to have not reported (they were “cen-sored,” in terms of our survival analysis). Be-cause studies that lacked enrollment data(n ¼ 113) did not report results, we excludedthem from our analysis.LimitationsWenote several limitations to our

analysis. First, ClinicalTrials.gov contains onlyself-reported data, so dates and trial statusmightnot have been accurately updated or might havechanged after we extracted our data. However,any missing data on completed studies wouldserve only to lower our estimate of the propor-tion of studies having reported results.Second,weused the “startmonth”—defined as

the date that enrollment in the protocol begins—in ClinicalTrials.gov to determine whether stud-ies were registered early or late. The federalmandate requires that studies register withintwenty-onedays after the first patient is enrolled.We may have incorrectly classified studies inwhich these dates differed because of delays inrecruiting the first patient.Finally, we could not assess whether research-

ers posted results in a selective manner—for in-stance, disproportionately reporting positivefindings. This was because we did not have anydata on whether or not unreported results re-flected poorly on the drugs under study.

Study ResultsTrial Characteristics Overall, 40,343 PhaseII–IV drug and biologic trials were registered

Exhibit 1

Characteristics Of Drug And Biologic Trials Registered With ClinicalTrials.gov,September 1999–May 2010

All studies (N = 40,343)

Characteristic Number PercentPhase

II 19,274 47.78III 13,334 33.05IV 7,735 19.17

Number of patients enrolled

Low (≤ 100) 19,601 52.78Medium (101–500) 12,625 33.99High (> 500) 4,913 13.23Unreported 3,204 7.94

Funding source

Industry 23,157 57.40Clinical research network 1,866 4.63National Institutes of Health 6,637 16.45Other federal agency 503 1.25Other/unknowna 17,983 44.58

One or more sites in the United States

Yes 21,083 52.26No 19,260 47.74

Registration statusb

Early 18,178 47.10Late 20,416 52.90

Results reported

Yes 1,351 3.35No 38,992 96.65

SOURCE Authors’ analyses of data from ClinicalTrials.gov. NOTES Includes only Phase II and highertrials. Trials could list multiple sources of funding. Percentages might not sum to 100 because ofrounding. aIncludes sponsors not categorized by ClinicalTrials.gov. bRegistration status iscategorized in two ways: “Early” means registering within twenty-one days of first enrollingpatients (the period mandated by the federal government); “late” means registering more thantwenty-one days after first enrolling patients. Registration status was calculated only forstudies with a reported start date (n ¼ 38;594).

Pharmaceuticals

2340 Health Affairs December 2011 30: 12

at GEORGE E BROWN JR LIBRARY on December 13, 2011Health Affairs by content.healthaffairs.orgDownloaded from 109

Page 160: Sharing Clinical Research Data:  an IOM Workshop

with ClinicalTrials.gov. Of these, 4,516 (11.2 per-cent) focused on biologics. Exhibit 1 shows thephase, funding, and other characteristics ofthese trials. Start dates were missing for 1,749trials (4.7 percent), sowe couldnot include themin our analysis of early registrations.

Registration Of StudiesMonthly trial regis-trations with ClinicalTrials.gov spiked to 3,474,from the previous monthly average of 78, whenthe mandate of the International Committee ofMedical Journal Editors took effect in Septem-ber 2005 (Exhibit 2). After that but before thefederal mandate took effect, there were, on aver-age, 372.35 more trials registered each monththan before the journal mandate (95% confi-dence interval: 338.61, 406.09; p < 0:001).Another spike of 755 registrations occurred in

December 2007, when the federal mandate tookeffect (Exhibit 2). The average monthly numberof registrations rose by a further 129.60 betweenDecember 2007 andMarch 2008, the start of thepost–federal mandate period (95% confidenceinterval: 86.22, 172.98; p < 0:001).However, thenumber of registrations dropped by 5.38 trials ineach month thereafter (95% confidence inter-val: −8.06, −2.70; p < 0:001). Thus, by Janu-ary 2010 the monthly number of registrationshad fallen to the level seen before the federalmandate took effect.

Early Registrations Over the entire studyperiod, 47.1 percent of the studies were regis-tered early—within twenty-one days of their startmonth—and 52.9 percent were registered late(Exhibit 1). The highest monthly share of early

registrations was 70.4 percent, in October 2009.Not surprisingly, there was a significant spike inthenumber of late registrationswhen the federalmandate took effect, in December 2007(n ¼ 463). Following that date, 39 percent oftrials were registered late.Our interrupted time-series model indicated

that the average number of early registrationsper month increased by 37.96 immediately afterthe federalmandate took effect (95%confidenceinterval: 15.26, 60.76; p < 0:01). However, thetrend then decreased, with 2.17 fewer registra-tions in each subsequentmonth thereafter (95%confidence interval: −3.58, −0.76; p < 0:01).This later decrease essentially reversed the im-mediate increase by the end of our study period.Posting Study Results Exhibit 3 shows the

characteristics of registered studies reported asbeing completed after September 2008, strati-fied by whether or not results had been reportedby May 31, 2010. Results had been posted onClinicalTrials.gov for 337 (7.6 percent) of the4,455 trials. Notably, 9.5 percent of industry-funded studies had reported results—a rate threetimes higher than for studies with any other typeof funding.Exhibit 4 shows the cumulative proportion of

trials that had reported results within a givennumber of days after study completion. One yearafter reporting completion, only 12.0 percent ofall trials (Exhibit 4) and only 14.1 percent ofPhase IV trials (data not shown) had reportedresults. The federal government requires that alltrials with a US study site or studies conducted

Exhibit 2

Number Of Drug And Biologic Trials Registered With ClinicalTrials.gov And Number Registered Early, September1999–May 2010

Num

ber o

f tria

ls re

gist

ered

Journalmandateenacted

Federalmandateenacted

SOURCE Authors’ analyses of data from ClinicalTrials.gov. NOTES Includes only Phase II and higher trials. “Early” means registeringwithin twenty-one days of first enrolling patients (the period mandated by the federal government). Journal mandate is the mandateof the International Committee of Medical Journal Editors. Both mandates are described in the text.

December 2011 30: 12 Health Affairs 2341

at GEORGE E BROWN JR LIBRARY on December 13, 2011Health Affairs by content.healthaffairs.orgDownloaded from 110

Page 161: Sharing Clinical Research Data:  an IOM Workshop

under a US investigational new drug applicationreport results within one year, unless granted anexemption. Unfortunately, we could not deter-mine which studies fit these criteria, becauseClinicalTrials.gov did not publicly report infor-mation on exemptions when we collectedour data.Several characteristicsof trialswere associated

with a higher probability of results reporting inour multivariate Cox proportional hazardsanalysis. Studies funded wholly or partly by in-dustry had a threefold higher hazard ratio forreporting than studies funded only by the NIH(hazard ratio: 3.00; 95% confidence interval:1.42, 6.33; p ¼ 0:004). Phase III trials (hazardratio: 2.00; 95% confidence interval: 1.50, 2.63;p < 0:001) and Phase IV trials (hazard ratio:2.25; 95% confidence interval: 1.66, 3.04;p < 0:001) were more likely to report resultsthan Phase II trials. And studies that had 101–500 enrollees (hazard ratio: 1.29; 95% confi-dence interval: 1.00, 1.67; p ¼ 0:047) or morethan 500 enrollees (hazard ratio: 1.419; 95%confidence interval: 1.03, 1.95; p ¼ 0:03) weremore likely to report results than trials with100 or fewer enrollees.

DiscussionA complete registry of all clinical trials and theirresults would enable more comprehensive andless biased studies andmeta-analyses of the clini-cal efficacy, off-label use, and safety of pharma-ceuticals.We found that at least in the short term,the federal mandate increased the number ofclinical trials registered on ClinicalTrials.gov.We also found, however, that the current ratesof results reporting mean that fewer than one ineight trials have posted results within one year.This result is important because there is evidenceof selective reporting even among registered tri-als.11 Furthermore,we found that trials fundedbyindustryweremore three times as likely to reportresults as were trials funded by the NIH.We share previously expressed concerns that

publication requirements are not sufficient forcomprehensively enforcing the timely and com-plete registration of clinical trials.12 Our findingsshow that the increases in registration as a resultof the requirements by the International Com-mittee ofMedical Journal Editors have persistedover the long term. However, the additionalshort-term impact of the federal mandate indi-cates that many studies went unregistered de-spite the journal editors’ requirement.Yet as tri-als increasingly move overseas, nationalmandates such as that of the Food and DrugAdministration have less force.13

Unfortunately, we have little sense of how

Exhibit 3

Characteristics Of Completed Drug And Biologic Trials Registered With ClinicalTrials.gov,September 1999–May 2010

Studies that reportedresults

Characteristic Number of studies Number PercentTotal 4,455 337 7.56

Phase

II 2,040 92 4.51III 1,509 161 10.67IV 906 84 9.27

Number of patients studied

Low (≤ 100) 2,242 122 5.44Medium (101–500) 1,560 138 8.85High (> 500) 653 77 11.79

Funding source

Industry 3,196 304 9.51Clinical research network 118 3 2.54National Institutes of Health 408 10 2.45Other federal agency 35 1 2.86Other/unknowna 1,405 45 3.20

One or more sites in the United States

Yes 2,283 158 6.92No 2,172 179 8.24

Registration statusb

Early 2,798 222 7.93Late 1,657 115 6.94

SOURCE Authors’ analyses of data from ClinicalTrials.gov. NOTES N ¼ 4;455. Includes only Phase IIand higher trials that reported being completed after September 2008. Trials with no information onnumber of patients enrolled were excluded. Trials could list multiple sources of funding. aIncludessponsors not categorized by ClinicalTrials.gov. bRegistration status is categorized in two ways:“Early” means registering within twenty-one days of first enrolling patients (the period mandatedby the federal government); “late” means registering more than twenty-one days after firstenrolling patients.

Exhibit 4

Cumulative Proportion Of Drug And Biologic Trials Having Reported Results InClinicalTrials.gov

Cum

ulat

ive

prop

ortio

n of

tria

lsre

port

ing

resu

lts

SOURCE Authors’ analyses of data from ClinicalTrials.gov. NOTES Includes only Phase II and highertrials that reported being completed after September 2008. The dashed line represents one yearafter completion, the mandated date for reporting results unless researchers receive an exemption.Cumulative proportions are based on Kaplan-Meier survival estimates and account for censoring atthe end of the study follow-up period.

Pharmaceuticals

2342 Health Affairs December 2011 30: 12

at GEORGE E BROWN JR LIBRARY on December 13, 2011Health Affairs by content.healthaffairs.orgDownloaded from 111

Page 162: Sharing Clinical Research Data:  an IOM Workshop

many trials remain unregistered because it isimpossible to determine the number currentlyunderway.Without this denominator, there is noway to know whether declining registrations inrecent years are the result of studies’ goingunregistered. Another possible explanation forthis drop is that fewerdrugs are reaching the trialstage. However, industry-reported research anddevelopment spending has continued to grow inrecent years, which suggests that the denomina-tor probably continues to be high.14

Failure To Report Results The current lowrate of results reporting on ClinicalTrials.gov,particularly for studies not funded by industry,is troublesome andmay affect the overall useful-ness of the registry in expanding clinical knowl-edge in a timely manner. As outlined above,results reporting of trials investigating un-approved uses can be delayed for up to threeyears.9

In our study, only 14 percent of Phase IV tri-als—postmarketing studies of drugs that havealready been approved—had posted resultswithin one year. These drugs are currently mar-keted and available for off-label use. Thus, thesafety and efficacy information that these trialsprovide could be important in determining bothdrug safety and efficacy. For instance, a recentmeta-analysis linking rosiglitazone (Avandia) toan increased risk of myocardial infarction reliedon a clinical trial registry for results from twenty-six of the forty studies the researchers exam-ined.15 These registry data were not publishedor otherwise available in public documents.Given this experience and the recent safety

problems that have appeared after approval formany drugs, Congress should reconsider thethree-year delay in reporting results for productsthat have been approved by the Food and DrugAdministration and are in use by patients. Publicdisclosure in a registry is vital for at least tworeasons. First, even if trials are registered, manystudies remain unpublished—particularly thosefunded by industry.16 And second, there is evi-dence that the primary and secondary outcomesreported in journal articles are often differentfrom those initially registered.17 Because of theseconcerns, public disclosure of all studies and allof their outcomes would make trial reportingboth more balanced and more credible.There are numerous possible reasons why

many studies continue to be registered lateanddonot report results inpublic registries suchas ClinicalTrials.gov. Both academic and indus-try researchers may be unaware of existing re-quirements for registration and reporting.18

There may also be difficulties involved in com-pleting the necessary data analysis within thespecified time frame.

In addition, investigators from both academeand industry have expressed a range of reserva-tions about trial registration and results report-ing.19–21 These reservations include concernsabout the confidentiality of proprietary informa-tion, competitiveness between companies andresearch groups, the extra work involved for re-searchers, and the possibility that disclosurecould jeopardize their chances of publica-tion.19–21

Ways To Increase Registration And Re-sults Reporting Our results indicate that in-dustry-sponsored studies are much more likelythan others to report results. Thismay reflect thecapacity of industry researchers to meet postingrequirements, the fact that they—unlike aca-demic researchers—are paid to post results, orthe strong incentive that manufacturers have topublicize positive results about their products.In any event, the overall low rate of results re-porting suggests that the incentives in place toencourage early trial registration and results re-porting might not be strong enough to achievethe desired result.Three potential ways to ensure completeness

of trial registrations and results reporting arestricter enforcement of the current penaltiesfor not reporting results; using Institutional Re-viewBoards and informed consent documents aspoints of leverage to encourage registration andreporting; andmaking early registration and theposting of results requirements for publicationin medical journals.It is unknown whether the current penalties

for not registering and reporting results are se-vere enough, given the sales at stake for manu-facturers.22 Fines for failing to submit clinicaltrial information are on the order of $10,000per day, according to the federal mandate.23

But manufacturers must weigh those penaltiesagainst possible risks to revenue streams that areoften orders of magnitude larger.However, as of February 2010 no fines had

been levied for noncompliance with the federalmandate (Jarilyn Dupont, director of regulatorypolicy, Food and Drug Administration, personalcommunication, March 18, 2010). Thus, it isunclear whether the threat of higher fines is nec-essary, or whether enforcement of the currentpenalties would be sufficient. Our findings alsosuggest that regulatory authorities should applystronger pressure to nonindustry researchers.This could include enforcing the current federalmandate provision that allows the authorities towithhold funds from the NIH for failure to com-ply with other mandate provisions.To ensure the comprehensive registration of

trials, the greatest point of leveragemight be notwith funders but with investigators, who could

December 2011 30: 12 Health Affairs 2343

at GEORGE E BROWN JR LIBRARY on December 13, 2011Health Affairs by content.healthaffairs.orgDownloaded from 112

Page 163: Sharing Clinical Research Data:  an IOM Workshop

also be required by Institutional Review Boardsto register their trials early.12,24 A proposed Foodand Drug Administration regulation requiringthat informed-consent documents contain trialregistration information would be a step in thisdirection. That change would force sponsors toregister trials prior to receiving ethics approvaland, as a consequence, prior to enrolling pa-tients.25

Increasing results reporting is also importantbecause many studies will never be published.26

For that reason, it is unlikely that journals’ pub-lication requirements could achieve this goal.Instead, regulators should consider requiringmore timely reporting of results from all drugtrials, including those concerning unapproveduses. They should also make use of currentenforcement mechanisms and possibly intro-duce stronger incentives in the future—such asthepublic disclosure of infractions and theuse oflarger fines—if the rate of results reporting re-mains low.Finally, it is also troubling that a large portion

of trials continue to register late. Medical andother journals that continue to publish the re-sults of trials that register late should refuse toconsider for publication future papers about tri-als that did not register early. To encourage fur-ther results reporting, journals should also insistthat the results for studies they publish be en-tered into a clinical trial registry at the earliestopportunity. To ensure that this takes place,journals must be clear that public disclosure ina registry does not constitute prior publication,which many journals prohibit.Overall, the federal mandate increased the

registration of drug trials on ClinicalTrials.gov,at least for a short period of time. Additionalmechanismsmay be required to ensure the com-pleteness of trial registration and results report-ing. Achieving that goal would improve the com-prehensiveness and balance of our knowledgebase about drugs, leading to more informed pre-scribing by physicians and improved patientsafety. ▪

The results of this study werepresented at the AcademyHealth AnnualResearch Meeting in Boston,Massachusetts, June 27, 2010. Thisstudy was supported in part by funds

from the University of British ColumbiaCentre for Health Services and PolicyResearch. Michael Law received salarysupport through a New InvestigatorAward from the Canadian Institutes of

Health Research and an Early CareerScholar Award from the Peter WallInstitute for Advanced Studies. Theauthors thank Lucy Cheng for herassistance.

NOTES

1 McGauran N, Wieseler B, Kreis J,Schüler Y-B, Kölsch H, Kaiser T.Reporting bias in medical research—a narrative review. Trials. 2010;11(1):37.

2 Zarin DA, Tse T. Moving towardtransparency of clinical trials. Sci-ence. 20087;319(5868):1340–2.

3 Chan A-W, Hróbjartsson A,HaahrMT, Gøtzsche PC, AltmanDG.Empirical evidence for selective re-porting of outcomes in randomizedtrials: comparison of protocols topublished articles. JAMA. 2004;291(20):2457–65.

4 Dickersin K, Rennie D. Registeringclinical trials. JAMA. 2003;290(4):516–23.

5 Turner EH, Matthews AM,Linardatos E, Tell RA, Rosenthal R.Selective publication of antidepres-sant trials and its influence on ap-parent efficacy. N Engl J Med.2008;358(3):252–60.

6 Zarin DA, Tse T, Ide NC. Trialregistration at ClinicalTrials.gov be-tween May and October 2005. NEngl J Med. 2005;353(26):2779–87.

7 Zarin DA, Ide NC, Tse T, Harlan WR,West JC, Lindberg DAB. Issues in theregistration of clinical trials. JAMA.2007;297(19):2112–20.

8 Tse T, Williams RJ, Zarin DA. Re-porting “basic results” in Clinical

Trials.gov. Chest. 2009;136(1):295–303.

9 Miller JD. Registering clinical trialresults: the next step. JAMA.2010;303(8):773–4.

10 Wagner AK, Soumerai SB, Zhang F,Ross-Degnan D. Segmented regres-sion analysis of interrupted timeseries studies in medication use re-search. J Clin Pharm Ther. 2002;27(4):299–309.

11 Mathieu S, Boutron I, Moher D,Altman DG, Ravaud P. Comparisonof registered and published primaryoutcomes in randomized controlledtrials. JAMA. 2009;302(9):977–84.

12 Levin LA, Palmer JG. Institutionalreview boards should require clinicaltrial registration. Arch Intern Med.2007;167(15):1576–80.

13 Glickman SW, McHutchison JG,Peterson ED, Cairns CB,Harrington RA, Califf RM, et al.Ethical and scientific implications ofthe globalization of clinical research.N Engl J Med. 2009;360(8):816–23.

14 Pharmaceutical Research and Man-ufacturers of America. Pharmaceut-ical industry profile 2011 [Internet].Washington (DC): PhRMA; 2011[cited 2011 Oct 26]. Available from:http://www.phrma.org/sites/default/files/159/phrma_profile_2011_final.pdf

15 Nissen SE, Wolski K. Effect of rosi-glitazone on the risk of myocardialinfarction and death from cardio-vascular causes. N Engl J Med.2007;356(24):2457–71.

16 Bourgeois FT, Murthy S, Mandl KD.Outcome reporting among drug tri-als registered in ClinicalTrials.gov.Ann Intern Med. 2010;153(3):158–66.

17 Ewart R, Lausen H, Millian N.Undisclosed changes in outcomes inrandomized controlled trials: anobservational study. Ann Fam Med.2009;7(6):542–6.

18 Reveiz L, Krleža-Jerić K, Chan A-W,De Aguiar S. Do trialists endorseclinical trial registration? Survey of aPubMed sample. Trials. 2007;8(1):30.

19 Krleža-Jerić K. Clinical trial regis-tration: the differing views of in-dustry, the WHO, and the OttawaGroup. PLoS Med. 2005;2(11):e378.

20 Scherer M, Trelle S. Opinions onregistering trial details: a survey ofacademic researchers. BMC HealthServ Res. 2008;8(1):18.

21 Berg JO. Clinical trial registries.JAMA. 2007;298(13):1514.

22 Majumdar SR, Almasi EA,Stafford RS. Promotion and pre-scribing of hormone therapy afterreport of harm by the Women’s

Pharmaceuticals

2344 Health Affairs December 2011 30: 12

at GEORGE E BROWN JR LIBRARY on December 13, 2011Health Affairs by content.healthaffairs.orgDownloaded from 113

Page 164: Sharing Clinical Research Data:  an IOM Workshop

Health Initiative. JAMA. 2004;292(16):1983–8.

23 Legal Information Institute. U.S.Code: title 21, chapter 9, subchapter3, section 331: prohibited acts [In-ternet]. Ithaca (NY): Cornell Uni-versity Law School; [cited 2011Oct 26]. Available from: http://

www.law.cornell.edu/uscode/html/uscode21/usc_sec_21_00000331- - - -000-.html

24 Blunt J. Research ethics committeesmust ensure registration of researchand accessibility of results. BMJ.1998;316(7127):312.

25 Food and Drug Administration. In-

formed consent elements. FedRegist. 2009;74(248):68750–6.

26 Ramsey S, Scoggins J. Commentary:practicing on the tip of an informa-tion iceberg? Evidence of under-publication of registered clinicaltrials in oncology. Oncologist.2008;13(9):925–9.

ABOUT THE AUTHORS: MICHAEL R. LAW, YUKO KAWASUMI &STEVEN G. MORGAN

Michael R. Law isan assistantprofessor at theUniversity ofBritish Columbia.

In this month’s Health Affairs,Michael Law and coauthorsexamine the impact of a federal lawrequiring researchers to registercertain studies and report theresults on ClinicalTrials.gov, apublic database of federally andprivately supported clinical trialsconducted in the United States andabroad. When the authors studiedthe mandate’s impact, they foundan initial increase in trialregistrations but a problem withlate registration andunderreporting of results. Only12 percent of studies reportedresults within one year, as requiredby law.The authors were motivated to

conduct the research based on theepisode that resulted in thewithdrawal of the diabetes drugAvandia from the US market. Thedefinitive trial of the drug,published in the New EnglandJournal of Medicine, used a trialinformation registry to examine theresults of a number of trials of thedrug.“[We] wondered how far we’d

come in term of informationdisclosure on clinical trials fordrugs since then,” says Law, andthat led to the authors’ inquiry into

how well legal requirements fortimely reporting on trial resultswere being met.

Law is an assistant professor inthe Centre for Health Services andPolicy Research at the University ofBritish Columbia. His researchfocuses on pharmaceuticaloutcomes and policy research,including evaluating the impact ofmedication adherence, the value ofnewer drugs, and both drugcoverage changes and direct-to-consumer advertising.

Law has been published inleading medical journals and hasreceived both national andinternational awards, including acareer award from the CanadianInstitutes of Health Research. Heholds a doctorate in health policyfrom Harvard University andcompleted a postdoctoralfellowship at Harvard MedicalSchool, where he trained inresearch methods and statistics.

Yuko Kawasumi isa researchassociate at theUniversity ofBritish Columbia.

Yuko Kawasumi is a researchassociate in the Department ofAnesthesiology, Pharmacology, andTherapeutics at the University ofBritish Columbia. Her primaryresearch interest is investigatingthe quality of prescription

medication use throughadministrative claims databases.Kawasumi holds a doctorate in

epidemiology from the Departmentof Epidemiology, Biostatistics, andOccupational Health at McGillUniversity. She was a postdoctoralresearch fellow at the University ofBritish Columbia Centre for HealthServices and Policy Research.

Steven G. Morganis an associateprofessor of healthpolicy at theUniversity ofBritish Columbia.

Steven Morgan is an associateprofessor of health policy andassociate director of the Centre forHealth Services and PolicyResearch at the University ofBritish Columbia. An expert inpharmaceutical policy, Morgan is arecipient of career awards from theCanadian Institutes of HealthResearch and the Michael SmithFoundation for Health Research.Morgan earned a master’s degree

in economics from Queen’sUniversity and a doctorate, also ineconomics, from the University ofBritish Columbia. He was a 2001–02 Canadian Associate HarknessFellow in Health Care Policy, whilea postdoctoral fellow in healtheconomics at the Centre for HealthServices and Policy Research.

December 2011 30: 12 Health Affairs 2345

at GEORGE E BROWN JR LIBRARY on December 13, 2011Health Affairs by content.healthaffairs.orgDownloaded from 114

Page 165: Sharing Clinical Research Data:  an IOM Workshop

COMMENTARY

Ten Recommendations for Closing theCredibility Gap in Reporting

Industry-Sponsored Clinical Research: A JointJournal and Pharmaceutical Industry Perspective

Bernadette A. Mansi, BA; Juli Clark, PharmD; Frank S. David, MD, PhD;Thomas M. Gesell, PharmD; Susan Glasser, PhD; John Gonzalez, PhD;

Daniel G. Haller, MD; Christine Laine, MD, MPH; Charles L. Miller, MA;

LaVerne A. Mooney, DrPH; and Maja Zecevic, PhD, MPH

ptctc(

Cancer Research, Melville, NY.

424 Mayo Clin Pro

T he credibility of industry-sponsored clinicalresearch has suffered in recent years, under-cut by reports of selective or biased disclo-

sure of research results, ghostwriting and guest au-thorship, and inaccurate or incomplete reporting ofpotential conflicts of interest.1,2 In response, manypharmaceutical companies have integrated bestpractices and recommendations from groups suchas the International Committee of Medical JournalEditors (ICMJE), the Good Publication Practiceguidelines, the Committee on Publication Ethics,the EQUATOR (Enhancing the QUAlity and Trans-parency Of health Resources) Network, and theMedical Publishing Insights and Practices (MPIP)initiative into their internal policies and standardoperating procedures.3-10 However, a credibilitygap remains: some observers, including somejournal editors and academic reviewers, maintaina persistent negative view of industry-sponsoredstudies.11 Given industry’s pivotal role in the de-velopment of new therapies, further improve-ments in research conduct and disclosure areneeded across the industry-investigator-editorenterprise to restore confidence in industry-sponsored biomedical research.

In 2008, the MPIP was founded by members ofthe pharmaceutical industry and the InternationalSociety for Medical Publication Professionals toelevate trust, transparency, and integrity in pub-lishing industry–sponsored studies through edu-cation and creation of a discussion forum amongindustry research sponsors and biomedical jour-nals.12,13 In 2010, the MPIP convened a roundta-ble of 23 journal editors and industry representa-tives (see the “Acknowledgments” section for a listof MPIP participants) to characterize the persis-tent and perceived credibility gap in industry-sponsored research and identify approaches to re-solve it. Attendees agreed that there have been

important improvements in the conduct and re- d

c. � May 2012;87(5):424-429 � doi:10.1016/j.mayocp.2012.02.009 �115

porting of industry-sponsored studies during thepast 5 years, but several opportunities remain foradditional improvement. Attendees reached consen-sus on a top 10 list of recommendations (Table), in-tended to serve as a call to action for all stakehold-ers—authors, journal editors, research sponsors,and others—to enhance the quality and transpar-ency of industry-sponsored clinical research re-porting. Although framed in the context of indus-try sponsorship, many of these recommendationswould enhance the credibility of clinical researchpublications in general, regardless of the fundingsource.

Recommendation 1: Ensure Clinical Studies andPublications Address Clinically ImportantQuestionsMany perceive a mismatch between the researchhypotheses of some industry-sponsored studiesand the needs of the public and practicing clini-cians to improve patient health. The best way toelevate the credibility of industry-sponsored clin-ical research is to ensure that such research isdesigned to answer important clinical and scien-tific questions while respecting regulatory re-quirements that may influence certain aspects ofstudy design. Credibility is compromised whenclinical research is intended for marketing pur-poses rather than advancing scientific and medi-cal knowledge. Sponsors could enhance transpar-ency and credibility by better explaining tojournals,14 the biomedical community, and the

ublic the decision-making process underlyinghe research endeavor. For example, sponsorsould be more transparent in describing how ex-ernal input and involvement from the academicommunity were obtained to inform study designeg, by acknowledging participants in protocol

From GlaxoSmithKline, Kingof Prussia, PA (B.A.M.,C.L.M.); Amgen, ThousandOaks, CA (J.C.); LeerinkSwann Consulting, LLC, Bos-ton, MA (F.S.D.); NovartisPharmaceuticals Corpora-tion, East Hanover, NJ(T.M.G.); International Soci-ety for Medical PublicationProfessionals, BriarcliffManor, NY (T.M.G.); John-son & Johnson Pharmaceuti-cal Research & Develop-ment, LLC, Raritan, NJ(S.G.); AstraZeneca,Macclesfield, United King-dom (J.G.); Journal of ClinicalOncology, Alexandria, VA(D.G.H.); Annals of InternalMedicine, Philadelphia, PA(C.L.); Pfizer Medical, NewYork, NY (L.A.M.); and TheLancet, Elsevier, New York,NY (M.Z.). Dr Gesell is cur-rently with United BioSource–Envision Group, FranklinLakes, NJ. Dr Haller is cur-rently with Gastrointestinal

evelopment, advisory boards, and other roles).

© 2012 Mayo Foundation for Medical Education and Researchwww.mayoclinicproceedings.org

Page 166: Sharing Clinical Research Data:  an IOM Workshop

rpod

rnals

TEN RECOMMENDATIONS FOR REPORTING INDUSTRY RESEARCH

Recommendation 2: Make Public All Results,Including Negative or Unfavorable Ones, in aTimely Fashion, While Avoiding RedundancyMany industry sponsors have committed to disclos-ing the results of all clinical studies through recog-nized trial registries. Complete and transparent dis-closure is required by law in many regions, fulfillsthe ethical obligation to trial participants, and is crit-ical to the advancement of science. The ability tocross-reference trial registries, results databases, andall related publications informs the scientific com-munity whether studies are completed or are underway and discourages selective reporting. Several re-cent articles have highlighted that there is a persis-tent need for better disclosure of clinical trial results,irrespective of study sponsorship.15-17

Whereas results from well-designed studies thatare negative, confirmatory, inconclusive, or less im-mediately relevant to practicing clinicians can beparticularly challenging to publish, these manu-scripts can contribute toward the progress of sci-ence, may open the door for future research, and canprevent redundancy. Sponsors and authors shouldstrive to disseminate them in appropriate venues,provided the data are put in context and the studylimitations are clearly stated. This may require edi-tors and publishers to explore and implement novelpublication approaches well suited to these sorts ofreports. Some possible approaches could includejournals dedicated to these studies (perhaps inopen-access format), abridged article formats moresuitable to them, and/or specific reviewing mecha-nisms focused on scientific validity as opposed to“impact.” Many of these potential solutions are cur-rently being explored and developed by journalsand publishers.12 Finally, sponsors and authorsshould avoid redundant or duplicate publications.As part of this effort, sponsors could support editors’

TABLE. Top 10 Recommendations for Closing the CreResearch

1. Ensure clinical studies and publications address clinic2. Make public all results, including negative or unfavor3. Improve understanding and disclosure of authors’ p4. Educate authors on how to develop quality manusc5. Improve disclosure of authorship contributions and

practices to end ghostwriting and guest authorship6. Report adverse event data more transparently and7. Provide access to more complete protocol informa8. Transparently report statistical methods used in ana9. Ensure authors can access complete study data, kno

10. Support the sharing of prior reviews from other jou

educational efforts to combat plagiarism.18 t

Mayo Clin Proc. � May 2012;87(5):424-429 � doi:10.1016/j.mayocp.2www.mayoclinicproceedings.org

Recommendation 3: Improve Understandingand Disclosure of Authors’ Potential Conflictsof InterestRecent advances by academia, journals, and organi-zations such as the Institute of Medicine and ICMJEhave improved processes for disclosing authors’ po-tential conflicts of interest. Consensus among jour-nal editors remains elusive on some specifics, suchas the time frame for the reporting period and min-imum amounts for financial reporting. These de-bates notwithstanding, however, editors and spon-sors should support the use of the updated ICMJEConflict of Interest Reporting Form19 and continuedialogue regarding its improvement. Furthermore,the same conflict of interest reporting standardsshould be applied uniformly to all authors, regard-less of the source of research funding.20

To further enhance disclosure transparency andeduce the administrative burden on authors, allarties should also explore the feasibility of devel-ping a centralized, publicly accessible disclosureatabase.21 Specific issues that need to be addressed

include responsibility for quality control and main-tenance, privacy/public data access issues, dataownership, and database funding. If these issuescould be resolved, a digital database would providean efficient and effective means of promoting trans-parent reporting.

Recommendation 4: Educate Authors on How toDevelop Quality Manuscripts and Meet JournalExpectationsIneffective or inappropriate reporting can diminishthe value and credibility of clinical research. Au-thors who are well versed in study design, conduct,and analysis may lack formal writing training orknowledge of reporting guidelines such as the Con-solidated Standards of Reporting Trials (CONSORT) re-quirements.22 The experiences of editors and indus-

ity Gap in Reporting Industry-Sponsored Clinical

mportant questionsones, in a timely fashion, while avoiding redundancyial conflicts of interestand meet journal expectations

ng assistance and continue education on best publication

more clinically meaningful manner

ow to do so, and can attest to this

dibil

ally iableotentriptswriti

in ationlysisw h

ry publication professionals suggest there is a large

012.02.009 425116

Page 167: Sharing Clinical Research Data:  an IOM Workshop

ltpsmfs

MAYO CLINIC PROCEEDINGS

426

“information gap” in this regard, particularly relatedto authors’ knowledge of key aspects of authorshipthat directly affect the quality and credibility of sub-mitted manuscripts.

Journals and research sponsors should collabo-rate to educate researchers and other groups whoconduct or contribute to publication development,whether they work in industry, academia, or othervenues. Best practice guides should be widely dis-seminated to industry and academic authors. Forexample, the MPIP’s Authors’ Submission Toolkit23

can help authors navigate the manuscript develop-ment and submission process, and EQUATOR’s au-thor resource library can provide guidance on re-search reporting.24 On the basis of author and editorfeedback, additional materials or educational pro-grams may be needed to supplement existing guide-lines; editors and industry representatives shouldcollaborate to identify specific areas of unmet needand develop educational resources to address them.Finally, editors have an additional opportunity andresponsibility to expand their academic educationalefforts and drive education within and among jour-nals to encourage the harmonization of editorialpolicies and reviewer standards in line with evolvingbest practices.

Recommendation 5: Improve Disclosure ofAuthorship Contributions and Writing Assistanceand Continue Education on Best PublicationPractices to End Ghostwriting and GuestAuthorshipResearch sponsors have improved their credibilitysignificantly by incorporating into their policies andstandard practices5-10 definitions of authorship andcontributorship developed by ICMJE, the AmericanMedical Writers Association, the European MedicalWriters Association, and other professional bodies.Importantly, these definitions explicitly recognizethe positive role that professional writers can play inmanuscript development, provided they are appro-priately acknowledged as authors or contributors(in accordance with ICMJE guidelines) and theirnames, affiliations, and potential conflicts of interestare disclosed.25

All parties must continue to work toward zerotolerance of ghostwriting (defined as failure to ac-knowledge individuals who helped write the paper)and guest authorship (defined as inclusion of indi-viduals as authors who do not qualify because theydo no meet ICMJE or journal criteria for author-ship). Sponsors should ensure that employees’ con-tributions are fully disclosed, using the same stan-dards for employees and nonemployees. Suchdisclosure should be done without applying quotasof the maximum number of industry-employed au-

thors or prespecified ratios of industry-employed to r

Mayo Clin Proc.117

independent authors that would inappropriately ex-clude individuals who qualify for authorship or ac-knowledgment. For their part, journals need toeliminate any biases against manuscripts that haveindustry authors. Finally, sponsors, academic insti-tutions, and editors should reinforce collaborativelythe unacceptability of ghostwriting and guestauthorship.

Recommendation 6: Report Adverse Event DataMore Transparently and in a More ClinicallyMeaningful MannerThere is an unmet need for better and more uniformreporting of adverse events. Although “no clinicallysignificant adverse events” and “no unexpected ad-verse events” are common shorthand in journal ar-ticles, such phrases lack clinical relevance, particu-larly regarding rare adverse events that may beimportant for agents used over a long period in largepopulations. Editors, sponsors, and clinicians wouldbenefit from consensus on more uniform reportingguidelines that clearly specify the type and format ofadverse event data provided in the manuscriptand/or supplemental material.26 In addition, jour-nals may need to revisit their manuscript length pol-icies if they wish this information to be present inthe main document.

Finally, editors and sponsors must educate au-thors on the need to balance the strength of adverseevent claims vs the features and limitations of trialdesign appropriately. In a study that is not pow-ered to assess rare adverse events, it is more ap-propriate to qualify the findings (eg, “additionalsevere adverse events were not detected in thisshort, small trial”) than to make broad statementsthat could mislead readers (eg, “generally safe andwell tolerated”).

Recommendation 7: Provide Access to MoreComplete Protocol InformationSome journals request submission of a clinical studyprotocol to validate methods and end points used insubmitted manuscripts, screen for analysis or re-porting errors, confirm that the manuscript matchesthe protocol, and verify that any protocol amend-ments do not affect study conduct or integrity inap-propriately.27 In addition, several journals post on-ine complete protocols or excerpts together withhe published manuscript to provide greater trans-arency and additional context to readers. Journalshould describe their protocol submission require-ents and publication policies in their instructions

or authors and apply them irrespective of studyponsorship.

Public dissemination of protocols by journals

aises several practical issues that require further

� May 2012;87(5):424-429 � doi:10.1016/j.mayocp.2012.02.009www.mayoclinicproceedings.org

Page 168: Sharing Clinical Research Data:  an IOM Workshop

RPTtoirt

TEN RECOMMENDATIONS FOR REPORTING INDUSTRY RESEARCH

discussion. To foster development of effective poli-cies on protocol disclosure, sponsors should engagewith journals on the appropriate format and organi-zation and the extent and legitimacy of redactions toprotect proprietary intellectual property or otherconcerns. In addition, because study protocols arefrequently amended as research progresses, disclo-sure of different versions in multiple public venuesmay create confusion and redundancy. Stakeholdersshould explore whether a single repository is feasi-ble and meets the objective of transparency.

Recommendation 8: Transparently ReportStatistical Methods Used in AnalysisMost journals assess statistical methods as a routinepart of the peer review process, often relying onestablished reporting guidelines such as CON-SORT.22 Sponsors should ensure that authors pro-vide adequate information about the chosen meth-ods based on the prespecified study design andparameters, and how they were applied to the finaldata set. To enhance credibility, authors need to ad-here to existing reporting standards, and editorshave a responsibility to uniformly enforce these re-quirements at their journals.

The issue of statistical analysis credibility war-rants further discussion among editors, authors, andresearch sponsors to explore how journals can de-velop policies that raise standards for all clinicalpublications, independent of the financial supportor authorship. Singling out industry-sponsored tri-als for additional statistical validation28 unfairly im-plies that these studies’ analyses are inherently defi-cient and deserve heightened scrutiny.

Recommendation 9: Ensure Authors Can AccessComplete Study Data, Know How to Do So, andCan Attest to ThisThe credibility of industry-sponsored research isthreatened when authors cannot explain or defendkey details of study design and analysis or verifyaccess to raw data. Sponsors’ letters of agreementswith authors should clearly define authors’ data ac-cess rights and expectations and their responsibilityto understand and attest to them if required by jour-nals. Industry sponsors need internal policies andprocedures that facilitate data access for investiga-tors and authors, and journals should unambigu-ously state their submission and publication policieson data access. Journals and sponsors should con-sider that data access needs may also differ, depend-ing on study type. For example, the level of dataaccess expected and feasible for all authors may notbe the same for a noninterventional study as for a

multicenter interventional trial.

Mayo Clin Proc. � May 2012;87(5):424-429 � doi:10.1016/j.mayocp.2www.mayoclinicproceedings.org

ecommendation 10: Support the Sharing ofrior Reviews From Other Journalshe MPIP’s Authors’ Submission Toolkit advises au-

hors, when submitting a rejected manuscript to an-ther journal, to explore the possibility of provid-ng a copy of the prior manuscript version andeviewers’ comments to demonstrate that sugges-ions have been incorporated.23 Although the

practice is not uniformly accepted at this time,some journals encourage the sharing of prior re-views on the grounds that it elevates transpar-ency, avoids duplicated efforts, and increases thequality of subsequent submissions.

Given that there is no consensus among editorson whether to accept prior reviews, this decisionshould be made by individual journals. Those jour-nals that accept prior reviews must articulate theirpolicies clearly and instruct their peer reviewers onhow best to use them, and authors must be educatedabout how journals integrate prior reviews into theirdecision-making processes.

ConclusionThese top 10 recommendations outline several oppor-tunities to enhance the transparency and credibility ofindustry-sponsored clinical research. Sponsors mustcontinue to promote best practice guidelines, incorpo-rate them into their policies and processes, andwork with other biopharmaceutical and medical de-vice companies and professional organizations toensure uniform adoption. They should also driveauthor education efforts by disseminating guide-lines and materials. Editors must ensure their poli-cies are clear, transparent, well publicized, and uni-formly applied irrespective of author affiliation orstudy sponsorship, and should increase their col-laboration and overall role in promoting bestpractices.

The MPIP roundtable participants identified sev-eral areas of potential collaboration between journalsand sponsors. Joint educational activities, such asguideline development (eg, the MPIP’s Authors’Submission Toolkit and this document), exemplifycollaboration toward the shared goal of reportinghigh-quality clinical research. Sponsors and editorscan also collaborate in ongoing educational activi-ties for authors of sponsored research and shouldjointly discuss areas of persistent ambiguity to ex-change ideas and align on key issues for mutual ben-efit. Finally, all parties should take the opportunityto extend these efforts toward raising the standardsfor all research activities, irrespective of industrysponsorship. Such efforts are vital to closing thecredibility gap in reporting industry-sponsored clin-

ical research.

012.02.009 427118

Page 169: Sharing Clinical Research Data:  an IOM Workshop

Dot

GtAsLAcPtt

PCMtaSsnrEa

CP

MAYO CLINIC PROCEEDINGS

428

ACKNOWLEDGMENTSApproximately 20 invitations to the MPIP roundta-ble in New York, NY, on November 4, 2010, wereextended to senior editors of general medical andspecialty journals across various therapeutic areas,including both prior MPIP activity participants andnew potential participants, of which 12 were ac-cepted. Participants in the MPIP roundtable were asfollows: Vito Brusasco, MD (editor in chief, Euro-pean Respiratory Journal), Edward Campion, MD(senior deputy editor, New England Journal of Medi-cine), Juli Clark, PharmD (director, global medicalwriting, Amgen), Finbarr Cotter, MB, PhD (editor inchief, British Journal of Haematology), Frank S. David,MD, PhD (director, Leerink Swann Consulting),Cynthia Dunbar, MD (editor in chief, Blood), RobertEnck, MD (editor in chief, American Journal of Hos-pice and Palliative Medicine), Michelle Evangelista,BA (analyst, Leerink Swann Consulting), Lorna Fay(director/team leader, publications management,Pfizer), Rollin Gallagher, MD, MPH (editor in chief,Pain Medicine), John Gonzalez (global skills lead–publications, AstraZeneca), Daniel G. Haller, MD(editor in chief emeritus, Journal of Clinical Oncology),Brian Jenkins (executive supplements editor, Elsevier),Christine Laine, MD, MPH (editor in chief, Annals ofInternal Medicine), Delong Liu, MD (editor in chief,Journal of Hematology and Oncology), ElizabethLoder, MD, MPH (clinical editor, British MedicalJournal), Bernadette Mansi (director, medical com-munications quality and practices, GlaxoSmith-Kline), Robert Matheis (director, medical communi-cations, evidence-based medicine, sanofi-aventis;president, International Society for Medical Publica-tion Professionals), Charles L. Miller (medical gover-nance information director, GlaxoSmithKline), La-Verne A. Mooney (director, publications management,Pfizer), Jake Olin, BS (associate, Leerink Swann Con-sulting), Kraig Schulz, MS (managing director, LeerinkSwann Consulting), and Maja Zecevic, PhD, MPH(North American senior editor, The Lancet). Partici-pants did not receive honoraria or stipends, but rea-sonable travel expenses were reimbursed to some onrequest and reported by the MPIP cosponsors accord-ing to their companies’ policies and applicable report-ing requirements. With the exception of Dr Loder, allauthors and roundtable participants affirm that thecontents of this article reflect the spirit and substanceof the discussion that took place at the meeting.

Tania Goel and Sarah Honig of Leerink SwannConsulting LLC provided writing and research as-sistance in the preparation of the outline and firstdraft of the manuscript.

Abbreviations and Acronyms: CONSORT � Consoli-dated Standards of Reporting Trials; ICMJE � InternationalCommittee of Medical Journal Editors; MPIP � Medical

Publishing Insights and Practices

Mayo Clin Proc.119

isclaimer: The views expressed in this article are thosef the authors alone and should not be read as representinghe position or views of their employers.

rant Support: The manuscript was developed throughhe MPIP initiative, which is currently funded by Amgen,straZeneca, Bristol-Myers Squibb, GlaxoSmithKline, John-

on & Johnson Pharmaceutical Research & Development,LC, Merck, Pfizer, and Takeda Pharmaceuticals Northmerica. Bristol-Myers Squibb, Johnson & Johnson Pharma-eutical Research & Development, LLC, Merck, and Takedaharmaceuticals North America joined the initiative afterhe 2010 MPIP roundtable meeting. Project management ofhe MPIP is provided by Leerink Swann Consulting.

otential Competing Interests: J.C., J.G. S.G., B.A.M.,.L.M., and L.A.M. are employees of companies that fundPIP. J.G. and T.M.G. are board members of the Interna-

ional Society of Medical Publication Professionals, which iscosponsor of the MPIP. F.S.D. is an employee of Leerinkwann Consulting, LLC, a life sciences consultancy thaterves a large number of biopharmaceutical, medical tech-ology, tools, and diagnostics companies worldwide and isetained by the MPIP initiative. M.Z. is an employee oflsevier, the publisher of Mayo Clinic Proceedings. D.G.H.nd C.L. have no financial support to disclose.

orrespondence: Address to Maja Zecevic, PhD, MPH, 360ark Ave S, New York, NY 10010 ([email protected]).

REFERENCES1. Bruyere O, Kanis JA, Ibar-Abadie ME, et al. The need for a

transparent, ethical, and successful relationship between aca-demic scientists and the pharmaceutical industry: a view of theGroup for the Rest of Ethics and Excellent in Science (GREES).Osteopor Int. 2010;21(5):713-722.

2. Pyke S, Julious SA, Day S, et al. The potential for bias inreporting of industry-sponsored clinical trials. Pharm Stats.2011;10(1):74-79.

3. Strahlman E, Rockhold F, Freeman A. Public disclosure ofclinical research. Lancet. 2009;373(9672):1319-1320.

4. PhRMA Principles on Conduct of Clinical Trials: Communica-tion of Clinical Trial Results. PhRMA Web site. http://www.phrma.org/sites/default/files/105/042009_clinical_trial_principles_final.pdf. Accessed January 31, 2012.

5. Amgen. Amgen Guidelines for Publications. Amgen Website. http://www.amgen.com/about/amgen_guidelines_for_publications.html. Accessed January 31, 2012.

6. AstraZeneca. Transparency of Trial Information. AstraZenecaWeb site. http://www.astrazeneca.com/Responsibility/Research-ethics/Clinical-trials/Transparency-of-trial-information. Acces-sed January 31, 2012.

7. GlaxoSmithKline. Reports and publications, Public policies,Disclosure of Clinical Trial Information. GlaxoSmithKline Website. http://www.gsk.com/reportsandpublications-policies.htm.Accessed January 31, 2012.

8. Johnson & Johnson. Transparency in Our Business Activities.Johnson & Johnson Web site. http://www.jnj.com/connect/about-jnj/our-citizenship/transparency-in-our-business-activities/transparency�in�our�business�activities. Accessed January31, 2012.

9. Merck. Merck Guidelines for Publication of Clinical Trials andRelated Works. Merck Web site. http://www.merck.com/

research/discovery-and-development/clinical-development/

� May 2012;87(5):424-429 � doi:10.1016/j.mayocp.2012.02.009www.mayoclinicproceedings.org

Page 170: Sharing Clinical Research Data:  an IOM Workshop

2

TEN RECOMMENDATIONS FOR REPORTING INDUSTRY RESEARCH

Merck-Guidelines-for-Publication-of-Clinical-Trials-and-Related-

Works.pdf. Accessed January 31, 2012.

10. Pfizer. Conducting Ethical Research – Registration, Disclosure, and

Authorship. Pfizer Web site. http://www.pfizer.com/research/

research_clinical_trials/registration_disclosure_authorship.jsp. Ac-

cessed January 31, 2012.

11. Angell M. Industry-sponsored clinical research: a broken sys-

tem. JAMA. 2008;300(9):1069-1071.

12. Clark J, Gonzalez J, Mansi B, et al. Enhancing transparency and

efficiency in reporting industry-sponsored clinical research: re-

port from the Medical Publishing Insights and Practices initia-

tive. Int J Clin Pract. 2010;64(8):1028-1033.

13. MPIP Web site. www.mpip-initiative.org. Accessed January 31,

2012.

14. Clark S, Horton R. Putting research into context – revisited.

Lancet. 2010;376(9734):10-11.

15. Lehman R, Loder E. Missing clinical trial data. BMJ. 2012;344:

d8158.

16. Prayle AP, Hurley MN, Smyth AR. Compliance with manda-

tory reporting of clinical trial results on ClinicalTrials.gov: cross

sectional study. BMJ. 2011;344:d7373.

17. Ross JS, Tse T, Zarin DA, et al. Publication of NIH funded trials

registered in ClinicalTrials.gov: cross sectional analysis. BMJ.

2011;344:d7292.

18. Fighting plagiarism [editorial]. Lancet. 2008:371(9631):2146.

19. ICMJE. Uniform Requirements for Manuscripts Submitted to

Biomedical Journals: Ethical Considerations in the Conduct

and Reporting of Research: Conflicts of Interest. ICMJE Web

Mayo Clin Proc. � May 2012;87(5):424-429 � doi:10.1016/j.mayocp.2www.mayoclinicproceedings.org

site. www.icmje.org/ethical_4conflicts.html. Accessed January31, 2012.

20. Lanier WL. Bidirectional conflicts of interest involving industryand medical journals: who will champion integrity? Mayo ClinicProc. 2009;84(9):771-775.

21. Steinbrook R. Online disclosure of physician-industry relation-ships. N Engl J Med. 2009;360(4):325-327.

22. CONSORT. The CONSORT statement. CONSORT Website. www.consort-statement.org/consort-statement. Acces-sed January 31, 2012.

23. Chipperfield L, Citrome L, Clark J, et al. Authors’ SubmissionToolkit: a practical guide to getting your research published.Curr Med Res Opin. 2010;26(8):1967-1982.

24. EQUATOR. Resources for authors of research reports.EQUATOR Network Web site. www.equator-network.org/resource-centre/authors-of-research-reports/authors-of-research-reports. Accessed January 31, 2012.

25. Norris R, Bowman A, Fagan JM, et al. International Society forMedical Publication Professionals (ISMPP) position statement:the role of the professional medical writer. Curr Med Res Opin.2007;23(8):1837-1840.

26. Ionnidis JPA, Evans SJW, Gøtzsche PC, et al. Better reportingof harms in randomized trials: an extension of the CONSORTstatement. Ann Intern Med. 2004;141(10):781-788.

27. Chan AW. Bias, spin and misreporting: time for full access totrial protocols and results. PLoS Med. 2008;5(11):e230.

8. Fontanarosa PB, Flanagin A, DeAngelis CD. Reporting conflictsof interest, financial aspects of research, and role of sponsors

in funded studies. JAMA. 2005;294(1):110-111.

012.02.009 429120

Page 171: Sharing Clinical Research Data:  an IOM Workshop

BioMed CentralTrials

ss

Open AcceCommentaryWhose data set is it anyway? Sharing raw data from randomized trialsAndrew J Vickers*

Address: Departments of Epidemiology and Biostatistics, Medicine, Urology, Memorial Sloan-Kettering Cancer Center, NY, USA

Email: Andrew J Vickers* - [email protected]

* Corresponding author

AbstractBackground: Sharing of raw research data is common in many areas of medical research,genomics being perhaps the most well-known example. In the clinical trial community investigatorsroutinely refuse to share raw data from a randomized trial without giving a reason.

Discussion: Data sharing benefits numerous research-related activities: reproducing analyses;testing secondary hypotheses; developing and evaluating novel statistical methods; teaching; aidingdesign of future trials; meta-analysis; and, possibly, preventing error, fraud and selective reporting.Clinical trialists, however, sometimes appear overly concerned with being scooped and withmisrepresentation of their work. Both possibilities can be avoided with simple measures such asinclusion of the original trialists as co-authors on any publication resulting from data sharing.Moreover, if we treat any data set as belonging to the patients who comprise it, rather than theinvestigators, such concerns fall away.

Conclusion: Technological developments, particularly the Internet, have made data sharinggenerally a trivial logistical problem. Data sharing should come to be seen as an inherent part ofconducting a randomized trial, similar to the way in which we consider ethical review andpublication of study results. Journals and funding bodies should insist that trialists make raw dataavailable, for example, by publishing data on the Web. If the clinical trial community continues tofail with respect to data sharing, we will only strengthen the public perception that we do clinicaltrials to benefit ourselves, not our patients.

IntroductionInvestigators routinely refuse to share raw data from ran-domized trials without giving a reason. In this paper, I willargue that attitudes towards data sharing in the clinicaltrial community need to rethought, drastically. I will startby sharing some personal experiences of data sharing.

Data sharing anecdote 1: data to help plan a phase II trialI was asked to help design an uncontrolled Phase II trial.The idea was to treat a small cohort of patients and, if the

results looked promising, then design a large, randomizedtrial to find out for sure if the treatment was of benefit. Toknow whether "results looked promising" we neededsome idea of how patients would do in the absence oftreatment. We found a randomized, placebo-controlledtrial accruing a similar group of patients to those weplanned to study. We contacted the authors asking if wecould obtain part of the raw data from their trial: a list ofpre-treatment and follow-up scores for the control arm.Given that the principal investigator (PI) worked at the

Published: 16 May 2006

Trials 2006, 7:15 doi:10.1186/1745-6215-7-15

Received: 27 January 2006Accepted: 16 May 2006

This article is available from: http://www.trialsjournal.com/content/7/1/15

© 2006 Vickers; licensee BioMed Central Ltd.This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Page 1 of 6(page number not for citation purposes)121

Page 172: Sharing Clinical Research Data:  an IOM Workshop

Trials 2006, 7:15 http://www.trialsjournal.com/content/7/1/15

National Institutes of Health (NIH), and that the studywas federally funded, we were therefore somewhat sur-prised when he wrote back a short note stating: "We arenot prepared to release the data at this point."

Data sharing anecdote 2: data for a meta-analysisI faced a problem when conducting a meta-analysis: sometrial reports had given summary data for number of eventsper patient, others had given the proportion of patientswho were event-free. I wrote to the PI of one study askingfor raw data so that I could convert from event numbersto event incidence. To her credit she said that she "wouldlove to share the data" – this was again a federally fundedstudy at the NIH – however, she was unable to do sobecause "my biostatistician won't release it".

Data sharing anecdote 3: data to test a novel statistical methodI developed a novel statistical method with a colleague atMemorial Sloan-Kettering Cancer Center (MSKCC). Weneeded a data set on which to test our method andthought that one of the large randomized trials conductedby the cancer co-operative groups might be appropriate.We approached an MSKCC physician who had a seniorposition in one of these groups. After an extended discus-sion, he eventually allowed us to present our ideas to theappropriate committee. This involved a subsequent 45minute telephone conference during which we pointedout that: a) the results of our analysis had no clinicalimplications and were of interest purely to statisticians; b)we would stress this point in any paper; c) that the co-operative group would be sent any paper before submis-sion and would have full veto power. After a good deal ofdeliberation, the committee finally relented ... to let uspresent a written proposal. To this we heard no reply.

Data sharing anecdote 4: data to predict the effects of a drugA large chemoprevention trial was published suggestingthat a widely used drug could help prevent a commoncancer. A colleague, who knew the investigators, pointedout that the researchers had measured a certain biomarkerat baseline. We have previously shown that this biomarkercould predict occurrence of the cancer. My colleaguetherefore proposed to the trial investigators that we col-laborate on a study to determine whether the biomarkercould be used to predict response to the study drug. Theplan was that they would release the raw data to me, Iwould build a statistical model and the two researchgroups would jointly write a paper. Our proposal wasrejected.

Data sharing anecdote 5: data to predict the effects of surgeryA randomized trial was published showing clear benefitsof a certain type of cancer surgery. As it happens, the PIwas a good friend of a close colleague of mine. I suggestedto my colleague that we obtain raw data from the trial tosee if we could identify which patients would benefit mostfrom surgery. This was relatively straightforward: he askedhis friend, his friend said yes. But the PI failed to getapproval from the co-investigators and we never receivedthe raw data.

Data sharing anecdote 6: a positive storyI have also had several positive experiences of data shar-ing: one example will suffice here. I proposed to Doug Alt-man, who has written many of the popular Statistics Notesarticles for the British Medical Journal, that we write a Sta-tistics Note on analysis of covariance (ANCOVA). Hethought that this was a good idea but recommended weobtain a data set to illustrate our approach. I had becomefriendly with Konrad Streitberger at a conference, so Iemailed him asking for raw data from a recently publishedstudy, and he sent me back an Excel spreadsheet, prettymuch by return email. The Statistics Notes paper was pub-lished with Streitberger's data [1]. Interestingly, we weresubsequently contacted by numerous statisticians whowanted to use the data for teaching purposes.

To share or not to share: guilty until proven innocentI was never given an explicit reason why any of myrequests to obtain raw data were rejected, so I can onlyspeculate. It might be that the investigators were afraidthat I might publish something that would misrepresenttheir findings. If so, sharing of raw data is hardly the issue:commentaries and review articles relying purely on atrial's published results are often mistaken – indeed, onearticle discussing the results of a surgery trial similar tothat discussed in anecdote 5 stated, wholly inaccurately,that patients had died as a result of surgery[2] – and itseems more likely that raw data would improve subse-quent commentary. Moreover, in three of the four caseswhere we planned to publish data, trialists were eitherinvited to be co-authors or were offered the opportunityto read, revise and, if necessary, veto papers before sub-mission. Alternatively, it might be that the trialists did notwant to be scooped by their own data. Yet suggestions forcollaborative publications were rejected and, moreover, Iam not aware that any of the groups we have approachedhave published papers addressing the questions that inter-ested us. It might be also be argued that the dispropor-tionate fear of being scooped suggests that investigatorsmight be overly concerned with their careers in compari-son to the value of research to patients.

Page 2 of 6(page number not for citation purposes)122

Page 173: Sharing Clinical Research Data:  an IOM Workshop

Trials 2006, 7:15 http://www.trialsjournal.com/content/7/1/15

Whatever the reason, it is clear that there is a burden ofproof problem here: "no data sharing until there is clearand unambiguous proof that no harm will result (espe-cially to me)", rather than, "we should share data unlesswe have some good reason to believe that this is not in thebest interests of patients." There appeared little recogni-tion that producers of data are part of a community withreciprocal obligations, or that all science builds on previ-ous science (untold thousands of hours of brilliant scien-tific work were required just to allow me to type this paperon my laptop). Moreover, there did not seem to be anyappreciation that sharing of raw data may lead to tech-niques or findings or further research that could help alle-viate human distress (appendix 1). The general discoursewas, instead, "this is my data set, why should I let youhave it?"

Kirwan has undertaken a more systematic study of atti-tudes about data sharing by surveying pharmaceuticalresearchers [3]. In line with my somewhat frustratingexperiences, he reports that about three out of fourresearchers, as well as an industry group, were opposed tomaking raw data available from trials after publication.What is of particular interest is that Kirwan documents thereasons given by researchers against data sharing and pro-vides a convincing riposte to each. For example, the mostcommon argument given against data sharing was thatanalytical methods are pre-specified in study protocolsand it is inappropriate to use other methods; in particular,an investigator analyzing shared data may engage in"data-dredging" or choose invalid forms of analysis. Kir-wan argues that it is for the scientific community as awhole, not for individual trialists, to judge the suitabilityof any reanalysis. Note also that in the data sharing anec-dotes above, the analyses I planned were entirely inciden-tal to the main purposes of a trial and hence could notpossibly have been pre-specified in the study protocol. Yetthey hardly seem like data-dredging. Kirwan reports andrejects several additional arguments such as "difficulty ofextracting data relating to a particular publication" (thismust have been done for the data to have been analyzed);costs of administering a database (a concern for theappropriate website, not the trialists) and "an alternativeanalysis may itself be commercially important". This lastpoint is particular interesting: Kirwan argues that if thevalue of an alternative analysis is recognized by the phar-maceutical company, it should be conducted before pub-lication of the study results, if not, "then all the morereason for ensuring that others in the field are in a posi-tion to identify the need for and conduct the analysis".

Whose data set is it anyway?In 2004, I published as PI a large study of acupuncture forthe treatment of chronic headache disorders [4]. At mybehest, 401 patients completed numerous questionnaires

over the course of a year, sometimes completing painscores four times a day for weeks at a time. Is this data setmine? Or does it really belong to the patients in the trial,and do I act merely as custodian? Several of the trialsdescribed in the anecdotes above were literally life anddeath: each of the steps on the survival curve was some-one's daughter, someone's father, someone's wife. And yetdetails of that death suddenly became a publicly-fundedresearcher's private property.

In the USA at least, the data legally belong to trialists onthe grounds that it requires work to create knowledgefrom data. But science, particularly medical science, isessentially an enterprise conducted for moral reasons. Weneed to do not just what is legal but what is right. As such,we must take into account the probable wishes of thepatients who give us their blood, fill in our questionnairesand die on our trials. It is difficult to believe that anypatient on my trial, who completed complex question-naires so diligently over such a long period of time, wouldreally have wanted me to keep the data for myself ratherthan share it with others for the benefit of medical sciencein general.

How to share dataIn many cases, sharing raw data is very straightforward,here I'll do it right now: click the link for Additional File 1to see a full data set from the acupuncture for headachetrial. It did not take me particularly long to create this file.I needed to have a clean, well-annotated data set anywayin order to conduct the statistical analyses for the trialpublication and so all that was required was to deletesome extraneous variables, convert to Microsoft Excel, andwrite a few comments describing the data set. These com-ments could be very short as I could refer to the publishedtrial for experimental details.

There are certain responsibilities I have as the producer ofthe data. First, I have to make sure my data set is clean,accurate and well-annotated, though this is good statisti-cal practice anyway. Second, I have to make sure that Ihave protected patient confidentiality. This is not just amatter of deleting names and addresses, as identity canconceivably be constructed from other data, such as dateof death, diagnosis and zip code (see published guidelinesfor help [5]). By the same token, however, anonymizingor "de-identifying" data is not particularly time consum-ing. Third, I have the responsibility of deciding the appro-priate level of detail for a data set. For example, in theacupuncture trial, patients were given the SF-36 quality oflife questionnaire on three occasions. This instrumentconsists of 36 questions that are summarized in 9domains. In creating the data set, I decided not to give theresponses to each question as this would require over 100

Page 3 of 6(page number not for citation purposes)123

Page 174: Sharing Clinical Research Data:  an IOM Workshop

Trials 2006, 7:15 http://www.trialsjournal.com/content/7/1/15

variables, rather I give the domain scores at each follow-up time.

A researcher wishing to use raw data from a randomizedtrial also has responsibilities. The most obvious is that thesource of the data must be cited. This is comparable to theuse of reference lists in papers to cite sources of publisheddata. Second, researchers have a responsibility to involvethe producers of the data set. Trialists have a deep knowl-edge of the data and their expertise and advice can beessential to avoid inappropriate analyses. As a generalrule, trialists should be invited to be collaborators on anynon-trivial novel research resulting from their data, withfull co-authorship on any resulting publication.

For more thoughts on the logistics of data sharing see thedatasharing.net website [6]; Appendix 1 summarizes thebenefits of sharing data from randomized trials: thisbuilds on earlier work by Hutchon [7].

Protecting the interests of trialistsIt is not implausible that a trialist's reputation or careercould be harmed by data sharing. For example, what if aresearcher downloaded the data set from my acupuncturetrial, conducted an inappropriate analysis and then pub-lished a paper showing how the new findings "refuted"my conclusions? Similarly, if I planned a secondary anal-ysis of my data, but was beaten to it by another researcher,I would lose the opportunity to publish a paper, andpapers are, of course, the currency of a scientific career. Inappendix 2, I outline some guidelines for data sharing thatserve the dual function of protecting investigators andoptimizing the scientific value of trials. Appendix 2 mightbe seen as a first attempt to answer Eysenbach and Sa's callfor a "code of conduct" for publication of raw data [8].

Two features of the guidelines are of particular interest.First, they ensure that trialists are either co-authors on anysubsequent analyses for publication or are mandated apublished commentary along with the new analysis.Moreover, trialists are guaranteed by the guidelines to bethe first to publish an analysis: they are not required topublish or share any data which, in good faith, they planto analyze for further papers. To give a concrete example,I am currently involved in the planning of a cancer trial inwhich various biomarkers will be taken at baseline. Iexpect that we will first publish the overall results of thetrial – cancer-free survival in each group – and later sub-mit a series of papers looking at questions such as whetherthe biomarkers predict either survival or response to thestudy intervention. I do not think there is any need tomake available raw data on the biomarkers until we pub-lish papers describing our biomarker analyses.

Nonetheless, I have little doubt that guidelines will notalways be followed and, even if they are, harm may stilloccasionally result from publication or sharing of rawdata. I think we have to accept that harm will result fromany policy. The key question whether the benefits fromgreater sharing of raw clinical trial data outweighs theseharms (in my view, this is far from a close call). We alsohave to consider whether restricting the free flow of scien-tific information to avoid rare adverse events is really whatwe want as a research community.

ConclusionThe NIH currently requires that grants of $500,000 ormore per year provide a "data-sharing plan" in the grantapplication [9]. Although this is a good start, it does notensure that data are indeed ultimately made available.Moreover, only a fraction of randomized trials are NIH-funded to the tune of $500,000 per year. My personalsolution would be to make it illegal to experiment onhumans and then fail to publish raw data within anappropriate period of time. I assume there may be specialcircumstances in which it might be reasonable to keepdata confidential – for example, during early phase devel-opment of a patented drug – and one might envisage thatexceptions might be granted. Lest my proposal seemrather draconian, consider whether some of the Vioxx-related deaths might have been avoided had Merck beenforced to publish raw data on individual patients.

Pending the somewhat unlikely implementation of suchlegislation, one starting point might be the journals: whatif journals, and Trials would be a good starting place,insisted that researchers submitted raw data, along withthe journal manuscript, for publication on the journalwebsite? I am not the first to make such a suggestion[7,10]. Adding raw data submission to the host of otherrequirements (conflict of interest, suggested reviewers,copyright transfer etc.) hardly seems burdensome. Indeed,trialists would merely be following the lead of genomeresearchers where "all scholarly scientific journals shouldnow require the submission of microarray data to publicrepositories as part of the process of publication" [11]. Wecould also go to the funding agencies (NIH, MedicalResearch Council, Wellcome) and suggest they demandpublication of raw data from randomized trials, in exactlythe same manner that they now require Open Access pub-lication.

The editors of Trials have written that they "encourageauthors to make available all (or some of) the raw datafrom the trials" [12] A voluntary code of this nature doesnot seem to work. Eysenbach and Sa report that althoughthe Journal of Medical Internet Research explicitly invitesauthors to attach raw data, none did so during the firsttwo years of publication [8].

Page 4 of 6(page number not for citation purposes)124

Page 175: Sharing Clinical Research Data:  an IOM Workshop

Trials 2006, 7:15 http://www.trialsjournal.com/content/7/1/15

So perhaps what we need above all is a fundamentalchange of attitudes within the clinical trial community.Data sharing is common in other areas of science,genomic research being the obvious example. I cannotbelieve that it is unreasonable to ask clinical trialists toshare data when, say, yeast researchers do it routinely.Let's make sharing of raw data a commonplace, naturalpart of the clinical trials process, in the same way that weview obtaining ethical approval or publication of the trialresults. If we fail to do so, we will only strengthen the pub-lic perception that we do clinical trials to benefit our-selves, not our patients.

PostscriptIf you have any personal anecdotes about sharing of datafrom randomized trials, please post a comment to thispaper or email me at [email protected]. Perhaps moreimportantly, if you are a trialist and have some data thatyou do not want to share, think you are justified in thisand believe your rationale is not covered in my guidelines,please let me know: I want to make sure that we developa framework that meets everyone's needs.

Appendix 1. Benefits of sharing raw data from randomized trials• Analyses can be reproduced and checked by others

• Acts as an additional incentive for checking that a dataset is clean and accurate

• Teaching

• Aids development and evaluation of novel statisticalmethods

• Allows testing of secondary hypotheses

• Aids design of future trials

• Simplifies data acquisition for meta-analysis

• May help prevent fraud and selective reporting

Appendix 2. Suggested code of conduct for analysis of published raw dataTerminology: the "trialists" are the authors of a publishedreport of a randomized trial; "independent investigators"are a separate group of researchers who wish to analyzethe raw trial data ("new analysis")

Code of conduct for independent investigators and journals1. Independent investigators planning to publish a newanalysis should contact the trialists before undertakingany analyses

2. One or more trialists should be offered a co-authorshipon any resulting papers

3. If trialists disagree with the methods or conclusions ofa new analysis:

a. They should not have veto power, unless this wasagreed beforehand by the independent investigators

b. They should, however, be guaranteed the opportu-nity to write a commentary to be published alongsidethe new analysis

4. Journals should not publish new analyses of previouslypublished data unless either a trialist is an author or a sep-arate commentary from a trialist is attached

5. Published new analyses should cite the original trial

Code of conduct for trialists1. Trialists must ensure that the data set is clean and wellannotated

2. Trialists must ensure that no individual patient couldbe identified from the data set

3. Trialists should be required to share or publish immedi-ately only those data associated with the main analyses ofa published trial

a. There is no need to share data required for plannedsecondary analyses (e.g. correlative studies) althoughtrialists should in good faith restrict data sharing onlyfor analyses that they have concrete plans to publish

b. Regardless of any plans for future analyses, all rawdata should be published or made available for shar-ing no longer than five years after first publication oftrial results

c. There is no need to update data (e.g. as deaths accruein a cancer trial)

4. Trialists must share all and any data requested if analy-ses are not to be published (e.g. data required to aid trialdesign)

Additional material

Additional File 1Data set from acupuncture headache trial.Data set from acupuncture headache trialClick here for file

Page 5 of 6(page number not for citation purposes)125

Page 176: Sharing Clinical Research Data:  an IOM Workshop

Trials 2006, 7:15 http://www.trialsjournal.com/content/7/1/15

Publish with BioMed Central and every scientist can read your work free of charge

"BioMed Central will be the most significant development for disseminating the results of biomedical research in our lifetime."

Sir Paul Nurse, Cancer Research UK

Your research papers will be:

available free of charge to the entire biomedical community

peer reviewed and published immediately upon acceptance

cited in PubMed and archived on PubMed Central

yours — you keep the copyright

Submit your manuscript here:http://www.biomedcentral.com/info/publishing_adv.asp

BioMedcentral

References1. Vickers AJ, Altman DG: Statistics notes: Analysing controlled

trials with baseline and follow up measurements. Bmj 2001,323(7321):1123-1124.

2. Vickers AJ: Interpreting data from randomized trials: theScandinavian prostatectomy study illustrates two commonerrors. Nat Clini Pract Urol 2005, 2(9):404-5.

3. Kirwan JR: Making original data from clinical studies availablefor alternative analysis. J Rheumatol 1997, 24(5):822-825.

4. Vickers AJ, Rees RW, Zollman CE, McCarney R, Smith CM, Ellis N,Fisher P, Van Haselen R: Acupuncture for chronic headache inprimary care: large, pragmatic, randomised trial. Bmj 2004,328(7442):744.

5. http://healthcare.partners.org/phsirb/hipaafaq.htm#b5accessed 1/24/2006. .

6. http://datasharing.net. Accessed 1/24/2006 .7. Hutchon DJR: Infopoints: Publishing raw data and real time

statistical analysis on e-journals. BMJ 2001, 322(7285):530.8. Eysenbach G, Sa ER: Code of conduct is needed for publishing

raw data. BMJ 2001, 323(7305):166.9. http://grants.nih.gov/grants/policy/data_sharing/

data_sharing_faqs.htm accessed 1/24/2006. .10. Altman DG, Cates C: Authors should make their data available.

BMJ 2001, 323(7320):1069a.11. Ball CA, Brazma A, Causton H, Chervitz S, Edgar R, Hingamp P,

Matese JC, Parkinson H, Quackenbush J, Ringwald M, Sansone SA,Sherlock G, Spellman P, Stoeckert C, Tateno Y, Taylor R, White J,Winegarden N: Submission of Microarray Data to PublicRepositories. PLoS Biology 2004, 2(9):e317.

12. Altman DG, Furberg CD, Grimshaw JM, Rothwell PM: Lead edito-rial: Trials - using the opportunities of electronic publishingto improve the reporting of randomised trials. Trials 2006,7(1):6.

[http://www.biomedcentral.com/content/supplementary/1745-6215-7-15-S1.xls]

Page 6 of 6(page number not for citation purposes)126

Page 177: Sharing Clinical Research Data:  an IOM Workshop

7 MARCH 2008 VOL 319 SCIENCE www.sciencemag.org1340

An almost steady flow of articles have

focused on the dangers or lack of effi-

cacy of widely used drugs, along with

allegations of hidden information, misinter-

preted data, regulatory missteps, and corpo-

rate malfeasance. Many of these accounts

involve analyses of research on human volun-

teers that had never been publicly dissemi-

nated (1, 2). The uproar caused by an analysis

of previously unpublished studies of the dia-

betes drug, Avandia, indicating that it may be

harmful (3), is one recent example (4–8). As a

result, many question whether sufficient

information about the safety and efficacy of

medical interventions is available to the pub-

lic (9) and whether society is meeting its eth-

ical responsibilities to human volunteers who

put themselves at risk (10, 11). Although

advances in all areas of science depend on

free exchange of data, clinical trials warrant

particular scrutiny because of their use of

human volunteers and our dependence on

their results to inform medical decisions.

The persistent gap between the number of

trials conducted and the number for which

results are publicly available has been well-

documented (12, 13). Results may not be pub-

licly disseminated for many reasons, ranging

from lack of interest by authors or editors to

publish results that seem uninteresting to out-

right attempts to hide “inconvenient” results

(14). A recent study suggests that over 30% of

trials of 12 antidepressants submitted to the

Food and Drug Administration (FDA) for

review, primarily those with negative results,

have not been published (1). One effect of such

“positive publication bias” is a boost in the

apparent efficacy of these interventions. Trial

registration policies, which mandate the public

listing of basic trial information, and results

database policies, which mandate submission

and public posting of summary results within a

certain time frame (15), represent one type of

response. However, improving transparency is

only part of the solution to the broader set of

concerns about medical interventions.

Promoting Transparency

Transparency exists along a continuum from

documentation that a trial exists to full disclo-

sure of the results data set at the end of the trial

(see figure, below). “Trial registries” address

one end of the spectrum by making public a

summary of protocol details at trial initiation.

“Results databases” provide public sum-

maries of results for key trial end points,

whether published or not. Some policies pro-

mote public access to data sets, such as the

National Institutes of Health (NIH) Data

Sharing Policy (16) and the policy of the Annals

of Internal Medicine policy, which publishes

author statements of willingness to share study

protocols, statistical codes, and data sets (17).

Numerous clinical trial registries exist; the

NIH has maintained ClinicalTrials.gov, the

largest single registry of clinical trials, since

2000 (18). Although the law that led to the

creation of ClinicalTrials.gov, the Food and

Drug Administration Modernization Act of

1997, called for the registration of some trials

of drug effectiveness for “serious or life-

threatening diseases and conditions” (19),

other registration policies have encouraged

broader voluntary registration of trial infor-

mation. A policy by the International Com-

mittee of Medical Journal Editors (ICMJE)

that requires prospective trial registration as a

precondition for publication, effective

September 2005 (20), led to a 73% increase in

trial registrations of all intervention types

from around the world (21). This increased

rate of trial registration has stayed constant at

about 250 new trials per week, resulting in

nearly 44,800 trials from 150 countries as of

September 2007.

Since 2005, the U.S. Congress has consid-

ered bills calling for clinical trial registration

and results reporting (22) and, on 27 Sept-

ember 2007, enacted the FDA Amendments

Act (FDAAA) (23). Section 801 of this law

(“FDAAA 801”) expands the scope of required

registrations at ClinicalTrials.gov and pro-

vides for the first federally funded trial results

database. It mandates registration of a set of

controlled clinical investigations, other than

phase I trials, of drugs, biologics, and devices

subject to regulation by the FDA. The law

applies to research for any condition regard-

less of sponsor type (e.g., industry, govern-

ment, or academic). These new statutory

requirements, although broader than previous

law, remain narrower than the transnational

policies of the ICMJE and the World Health

Organization (WHO), which call for the regis-

tration of all interventional studies in human

beings regardless of intervention type (11, 24).

FDAAA 801 also increases the number of

mandatory data elements, corresponding to

the WHO and ICMJE international standard,

and requires ClinicalTrials.gov to link the reg-

istry to specified, existing results information

publicly available from the FDA Web site,

including summary safety and effectiveness

data, public health advisories, and action pack-

ages for drug approval.

FDAAA 801 also calls on the NIH to

augment ClinicalTrials.gov to include a

“basic results” database by September 2008.

Data elements specified in the law include

participant demographics and baseline char-

acteristics; primary and secondary out-

comes and statistical analyses; and disclo-

sure of agreements between sponsors and

nonemployees restricting researchers from

disseminating results at scientific forums.

Generally, these data will be available to the

public within 12 months of trial completion

or within 30 days of FDA approval (or clear-

ance for devices) of a new drug, biologic, or

device. The capacity to collect and display

serious and frequent adverse event data

observed during a trial is to be added to the

system within 2 years.

POLICYFORUM

As new policies promote transparency of

clinical trials through registries and results

databases, further issues arise and require

examination.

Moving Toward Transparency of Clinical TrialsDeborah A. Zarin* and Tony Tse

MEDICINE

National Library of Medicine, National Institutes ofHealth, U.S. Department of Health and Human Services,Bethesda, MD 20894, USA.

*Author for correspondence. E-mail: [email protected]. gov

Trial existence

Summary

of protocol

details

Access to

full protocol

Summary

of results

Scientific

publication

Access to

full data set

CLINICAL TRIALS REGISTRY RESULTS DATABASE

LEVELS OF TRANSPARENCY

Published by AAAS

on

June

12,

201

2w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from

127

Page 178: Sharing Clinical Research Data:  an IOM Workshop

Will FDAAA 801 Solve Recent Problems?

The table (below) illustrates a typology of

public concerns about the system of evaluat-

ing drugs and devices. We have categorized

selected recent controversial issues by alleged

problem: design, conduct, or analysis of the

study; lack of public information about the

study existence or results; and regulatory

agency decision-making.

FDAAA 801 directly addresses issues

stemming from a lack of transparency in clin-

ical trials, represented by the examples in the

highlighted section of the table. For instance,

GlaxoSmithKline (GSK)–sponsored trial data

for the heavily promoted antidepressant Paxil

showing efficacy and safety concerns in chil-

dren and adolescents were not available to the

public (25, 26). The resulting 2004 legal set-

tlement between GSK and the New York State

Attorney General’s office required GSK to

develop a publicly accessible, online results

database (27) for the timely, comprehensive

posting of trial results for company-marketed

drugs (28, 29). In the case of Vioxx, a

COX-2 nonsteroidal anti-inflammatory drug

(NSAID), “Several early, large clinical trials

… were not published in the academic litera-

ture for years after Merck made them avail-

able to the FDA, preventing independent

investigators from accurately determining its

cardiovascular risk…” (p. 122) (30). With evi-

dence that Vioxx is associated with increased

cardiovascular risk from subsequent clinical

trials, the drug was voluntarily withdrawn

from the market in September 2004.

More recently, questions have focused on

the ENHANCE trial, an industry-sponsored

study of Zetia, a marketed nonstatin choles-

terol-lowering drug that is also a component

of the cholesterol-lowering drug Vytorin.

Issues include delayed trial registration and

results reporting, as well as attempting to

modify prespecified primary outcome meas-

ure (31). Results, generally regarded as nega-

tive, were revealed in a company press release

(32) after intense media attention and a con-

gressional investigation (2).

Issues related to the design or conduct

of clinical trials, including research ethics,

are not covered by FDAAA 801. For exam-

ple, allegations of human research protec-

tions violations in a 1996 Trovan (anti-

biotic) study on children in Zaire (33) and

data integrity questions about Ketek (anti-

biotic) study 3014, in which FDA inspec-

tors detected data fraud and other serious

violations (34), would not be affected by

FDAAA 801. Further, Merck studies of

Arcoxia, a COX-2 NSAID, involving over

34,000 patients, were judged to be of lim-

ited scientific interest because of the use of

an “inappropriate” comparator with many

known side effects of its own (35). Never-

theless, it is possible that complete trial

registration and results reporting might

have helped institutional review boards

(IRBs) assess the need for each additional

Arcoxia trial.

Although current policies have focused

on interventional studies, observational

studies play an increasingly critical role in

biomedical research, especially in the

assessment of safety after an intervention

comes into widespread use. Postmarket

observational studies provide data about

rare, unanticipated adverse effects from the

exposure of large numbers of heteroge-

neous individuals over periods of time

longer than typically studied in controlled

trials (36). Despite methodological limita-

tions, such as susceptibility to confounding

factors and other sources of bias, which

potentially lead to inconclusive or mislead-

ing results [e.g., data on risks of post-

menopausal hormone replacement therapy

from the Women’s Health Initiative (37)],

observational research, nevertheless, can

play a useful, complementary role to inter-

ventional studies (38). Yet observational

studies have received less attention in the

quest for transparency.

In the case of a cholesterol-lowering

drug (Baycol), a company-conducted obser-

vational study showed a higher relative rate

of muscle breakdown for Baycol compared

with another marketed statin. This finding

was never reported publicly, but became

available as a result of litigation (39). In

another case, results of a Bayer-commis-

sioned postmarket observational study of

67,000 patients that raised concerns about

associations between Trasylol, a clotting

drug, and cardiovascular or renal risk were

not released by the company until after an

FDA advisory committee meeting to evaluate

the safety of Trasylol (36, 40). In Nov-

ember 2007, Bayer announced the volun-

tary suspension of Trasylol sales world-

wide, pending further analysis of safety

data. Some cases reflect a deficiency in

monitoring serious adverse events once a

drug or device is marketed. In other cases,

such as Guidant’s Ventak Prizm 2 DR

implantable cardioverter-defibrillator, infor-

mation about a problematic adverse event

profile (i.e., device malfunctions) was not

publicly disseminated in a timely fashion (41).

Other concerns relate to thresholds for

determining what and how much data are

sufficient to prompt regulatory decisions

regarding the availability or labeling of a

medical product. For example, questions

have been raised about the timeliness and

adequacy of FDA’s response to data on

Avandia (42), trials of Ketek (34), and on the

use of some antidepressants in children

(43, 44). Transparency policies alone do not

address these issues, although they could

help to empower members of the public who

believe that regulatory action is warranted.

For example, FDA mandated changes to the

Avandia label in November 2007 to reflect

the risk of myocardial infarction; this risk

was first publicized in a meta-analysis pub-

lished in June 2007 (3) that used, in part, data

from the GSK database mandated by the

Paxil settlement.

www.sciencemag.org SCIENCE VOL 319 7 MARCH 2008 1341

POLICYFORUM

CATEGORY OF PERCEIVED

PROBLEM

SAMPLE ISSUES RECENT

EXAMPLES

ADDRESSED

BY § 801?

Design, conduct, or analysis • Appropriateness of comparator

Lack of data integrity/fraud

Insufficient informed consent

Arcoxia (35)

Ketek (34)

Trovan (33)

No

• Paxil (40)

Vioxx (30)

Zetia (31)

Yes

• No

• Pending†

• Delayed agency disclosure

Delayed agency action•

Ketek (34)

Avandia (42)

No

*Ventak Prizm 2 DR implantable cardioverter-defibrillator. †For example, postmarket surveillance is addressed in FDAAA, § 905.

Suppression of trial existence

and results

Baycol (39)

Trasylol (36)

Regulatory agency decision-making

Suppression of study existence

and results withheld

Failure to disseminate data ICD* (41)

Lack of public information

Clinical trial

Postmarket adverse event reports

Observational study

Sample medical product safety concerns by category of perceived problems.

Published by AAAS

on

June

12,

201

2w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from

128

Page 179: Sharing Clinical Research Data:  an IOM Workshop

7 MARCH 2008 VOL 319 SCIENCE www.sciencemag.org1342

POLICYFORUM

Future Challenges

FDAAA 801 is intended to greatly expand the

level of transparency for clinical trials, which

could have a transforming effect on our

system of evaluating drugs and devices.

However, FDAAA 801 still leaves areas of

“opacity.” For example, although certain

medical device trials must be registered at

trial inception, this information is withheld

from the public until device approval or clear-

ance by FDA. The resulting “lockbox” pre-

vents disclosure of trials of devices that are

never approved or cleared (e.g., where safety

concerns arose or sponsors abandoned fur-

ther development). For instance, Boston

Scientific stopped development of an experi-

mental stent after clinical trials revealed fre-

quent fractures in the device (45). In addition

FDAAA 801 does not mandate public report-

ing for phase I drug trials and trials involving

investigational interventions not regulated by

the FDA, such as surgical procedures and

behavioral therapies. Thus, lessons from

phase I trials, such as the life-threatening

adverse events in healthy volunteers caused

by the superagonist, TGN1412, could go

unreported to the public and, potentially,

result in redundant studies by future unsus-

pecting researchers (46). Further, results

reporting is currently mandatory only for tri-

als of FDA-approved medical products,

allowing the results of unapproved products

to remain hidden from public view.

Intellectual property–related issues.

Current restrictions reflect a delicate bal-

ance between protection of commercial

interests and promotion of public health

(15). Pharmaceutical, biotech, and medical

device manufacturers are concerned that

disclosures may undermine competitive

advantage (47). However, there are impor-

tant ethical and scientific reasons for

broader disclosure: Trial participation by

humans is predicated on the concept that the

trial will add to “medical knowledge,” which

requires dissemination of the results. In

addition, it is not possible for a volunteer or

an IRB to assess the risks and benefits of

participation in a clinical trial if an unknown

proportion of data on the proposed interven-

tions is not publicly available (48). FDAAA

801 calls on the Secretary of Health and

Human Services to consider whether public

health interests would support expanding

the results requirements to include un-

approved drugs, biologics, and devices,

through a 3-year rule-making process.

Validation of information. Concerns

have been raised about verifying the com-

pleteness or accuracy of sponsor- and

researcher-submitted results data. The vol-

ume of completed trials, the lack of access

to protocols and data sets, and the subjec-

tive nature of some judgments are barriers

to validation. The law reflects these con-

cerns and mandates the reporting of objec-

tive data in tables. The NIH and FDA are

directed by the law to conduct a pilot qual-

ity-control project to determine the optimal

method of verification. In addition, narra-

tive summaries could be required in the

future, but only if “such types of summaries

can be included without being misleading

or promotional” (23). Additional research

will be necessary to explore whether and

how this might be accomplished.

Interpretation of information. The results

database will require an interface to assist

users in finding study results. One concern is

that members of the lay public and media

may be ill-prepared to interpret summary

results data (49). Currently, there are no stan-

dards or guidelines for providing and

explaining study results to trial participants

or to members of the public (50). FDAAA

801 calls for the development of informa-

tional materials for consumers in consulta-

tion with health and risk communication

experts. Furthermore, clinicians may be con-

cerned that the existence of a results database

will increase the patient expectations that cli-

nicians will be knowledgeable about all

results in the database, even those that were

never published or discussed in a journal. In

addition, although the results database will

facilitate the conduct of carefully performed

and comprehensive systematic reviews,

some also worry that the public will have a

difficult time assessing the quality of the

multitude of analyses that may result (51).

Conclusion

FDAAA 801 should transform the degree of

public access to critical clinical trial informa-

tion from publicly and privately funded clini-

cal research. J. M. Drazen, Editor-in-Chief,

New England Journal of Medicine, has noted

that currently, some patients are “left on the

cutting room floor to make a drug look better

than it really is” (52); FDAAA 801 should go

a long way in ensuring that all patients and all

data are publicly accounted for.

References and Notes:1. E. H. Turner, A. M. Matthews, E. Linardatos, R. A. Tell,

R. Rosenthal, N. Engl. J. Med. 358, 252 (2008).2. A. Berenson, New York Times, 15 January 2008. 3. S. E. Nissen, K. Wolski, N. Engl. J. Med. 356, 2457

(2007).4. R. Stein, Washington Post, 22 May 2007, p. A3.5. G. Harris, New York Times, 12 September 2007.6. K. Dixon, Reuters, 26 June 2007.7. Editorial Lancet 369, 1834 (2007).8. C. J. Rosen, N. Engl. J. Med. 357, 844 (2007).

9. “Rosiglitazone: Seeking a balanced perspective,”New York Times, 6 June 2004.

10. J. M. Drazen, S. Morrissey, G. D. Curfman, N. Engl. J.

Med. 357, 1756 (2007).11. C. Laine et al., N. Engl. J. Med. 356, 2734 (2007).12. R. J. Simes, J. Clin. Oncol. 4, 1529 (1986).13. P. J. Easterbrook, J. A. Berlin, R. Gopalan, D. R. Matthews,

Lancet 337, 867 (1991).14. R. T. Johnson, K. Dickersin, Nat. Clin. Pract. Neurol. 3,

590 (2007).15. C. B. Fisher, Science 311, 180 (2006).16. U.S. National Institutes of Health, “NIH Data Sharing

Policy and Implementation Guidance” (NIH, Bethesda,MD, 2003); http://grants.nih.gov/grants/policy/data_sharing/data_sharing_guidance.htm.

17. Information for Authors, Ann. Intern. Med.; www.annals.org/shared/author_info.html.

18. D. A. Zarin et al., JAMA 297, 2112 (2007).19. Food and Drug Administration Modernization Act of

1997, Public Law No. 105-115 § 113 (1997).20. C. D. DeAngelis et al., JAMA 292, 1363 (2004).21. D. A. Zarin, T. Tse, N. C. Ide, N. Engl. J. Med. 353, 2779

(2005).22. E. D. Williams, “Clinical Trials Reporting and

Publication,” July 2007 (Congressional Research Service,Washington, DC. 2007).

23. FDAAA, Public Law No. 110-85 § 801 (2007).24. I. Sim et al., Lancet 367, 1631 (2006).25. D. Rennie, JAMA 292, 1359 (2004).26. The People of the State of New York v. GlaxoSmithKline,

Complaint, filed 2 June 2004.27. GlaxoSmithKline Clinical Trial Register;

http://ctr.gsk.co.uk/welcome.asp.28. The People of the State of New York v. GlaxoSmithKline,

Consent Order & Judgment: Civil Action No. 04-CV-5304MGC, ordered 26 August 2004.

29. O. Dyer, BMJ 329, 590 (2004).30. H. M. Krumholz, J. S. Ross, A. H. Presler, D. S. Egilman,

BMJ 334, 120 (2007).31. A. Berenson, New York Times, 21 December 2007.32. “Merck/Schering-Plough Pharmaceuticals provides results

of the ENHANCE trial,” Product News, 14 January 2008(Merck & Co., Whitehouse Station, NJ, 2008);www.merck.com/newsroom/press_releases/product/2008_0114.html.

33. J. Stephens, Washington Post, 30 May 2007, p. A10.34. D. B. Ross, N. Engl. J. Med. 356, 1601 (2007).35. A. Berenson, New York Times, 24 August 2006.36. W. R. Hiatt, N. Engl. J. Med. 355, 2171 2006).37. C. Laine, Ann. Intern. Med. 137, 290 (2002).38. J. Avorn, N. Engl. J. Med. 357, 2219 (2007).39. B. M. Psaty, C. D. Furberg, W. A. Ray, N. S. Weiss, JAMA

292, 2622 (2004).40. J. Avorn, N. Engl. J. Med. 355, 2169 (2006).41. R. G. Hauser, B. J. Maron, Circulation 112, 2040 (2005).42. B. M. Psaty, C. D. Furberg, N. Engl. J. Med. 357, 67

(2007).43. R. Waters, San Francisco Chronicle, 1 February 2004,

p. A1.44. C. Holden, Science 303, 745 (2004).45. B. Meier, New York Times, 30 October 2007.46. M. Goodyear, BMJ 332, 677 (2006).47. H. I. Miller, D. R. Henderson, Nat. Rev. Drug Discov. 6,

532 (2007).48. L. A. Levin, J. G. Palmer, Arch. Intern. Med. 167, 1576

(2007).49. A. J. Pinching, J. R. Soc. Med. 88 (Suppl. 24), 12 (1995).50. A. H. Partridge, E. P. Winer, JAMA 288, 363 (2002).51. E. Wager, PLoS Clin. Trials 1, e31 (2006).52. A. W. Mathews, A. Johnson, Wall Street Journal, 23

January 2008, p. A14.53. Supported by the Intramural Research Program of the

NIH, National Library of Medicine. We thank J. Sheehanfor comments on the draft manuscript. The ideas andopinions expressed are the authors’. They do not repre-sent any policy position of the NIH, Public Health Service,or Department of Health and Human Services.

10.11226/science.1153632

Published by AAAS

on

June

12,

201

2w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from

129

Page 180: Sharing Clinical Research Data:  an IOM Workshop

TRAVEL INFORMATION

Page 181: Sharing Clinical Research Data:  an IOM Workshop

Forum on Drug Discovery, Development, and Translation Forum on Neuroscience and Nervous System Disorders

Roundtable on Translating Genomic-Based Research for Health National Cancer Policy Forum

MEMORANDUM

TO: Workshop Attendees

FROM: Tonia E. Dickerson, Senior Program Assistant

DATE: September 24, 2012

SUBJECT: Logistics Information – Sharing Clinical Research Data: An IOM Workshop

Date/Time: Thursday, October 4, 2012 8:00 a.m. – 8:30 a.m. (Breakfast)

8:30 a.m. – 5:30 p.m. (Workshop)

Friday, October 5, 2012 7:30 a.m. – 8:00 a.m. (Breakfast)

8:00 a.m. – 12:45 p.m. (Workshop)

Location: Primary Meeting Room:

NAS Room 125

National Academy of Sciences

2101 Constitution Avenue, N.W.

Washington, D.C. 20418

Overflow Meeting Space:

NAS Board Room and NAS Room 118

National Academy of Sciences

2101 Constitution Avenue, N.W.

Washington, D.C. 20418

Agenda: This will be a 1.5 day workshop on sharing clinical research data. The workshop will

be held in Room 125 of the National Academy of Sciences (NAS) building with

overflow space available in room 118 and in the Board Room. The overflow space

will broadcast the live video webcast of the meeting and allow attendees to

participate in the workshop discussion by asking questions of workshop speakers and

panelists.

The workshop will begin on Thursday, October 4th with breakfast available at 8:00

am in NAS 125. The actual meeting will begin at 8:30 am and conclude at 5:30 pm.

On Friday, October 5th, we will begin with breakfast available at 7:30 am and the

meeting will start at 8:00 am and conclude by 12:45 pm.

Page 182: Sharing Clinical Research Data:  an IOM Workshop

2

Workshop Objectives:

• Examine the benefits of sharing of clinical research data, and specifically clinical trial data, from

all sectors and among these sectors, including, for example:

o Benefits to the research and development enterprise

o Benefits to the analysis of safety and efficacy

• Identify barriers and challenges to sharing clinical research data

• Explore strategies to address these barriers and challenges, including identifying priority actions

and “low-hanging fruit” opportunities

• Discuss strategies for using these potentially large data sets to facilitate scientific and public

health advances.

Meals: Breakfast will be provided on both days of the meeting. Lunch will also be provided on

Thursday, the first day of the workshop.

IOM staff contact information:

Rebecca English, Associate Program Officer Tonia Dickerson, Senior Program Assistant

Phone: 202-334-2582 Phone: 202-334-2914

Email: [email protected] Email: [email protected]

Claire Giammaria, Research Associate

Phone: 202-334-3841

Email: [email protected]

Page 183: Sharing Clinical Research Data:  an IOM Workshop

Getting to the National Academy of Sciences

2101 Constitution Avenue

Directions to the National Academy of Sciences BuildingThe National Academy of Sciences Building, located in the Foggy Bottom area of Washington,

D.C., is served by Ronald Reagan National Airport (DCA),

Baltimore/Washington International Airport

lines.

By Car from Ronald Reagan National Airport:

1. Take Airport Exit to George Washington Memorial Parkway North.

2. Take exit to Memorial Bridge.

3. Bear left after crossing Memorial Bridge.

4. Take second left onto Henry Bacon Drive, NW.

be blocked by Jersey walls.

5. Take right at light onto Constitution Avenue, NW.

6. Take left at second light onto 21st NW.

7. Parking lot entrance is on left before stop sign at intersection with C Stree

8. Walk north towards C Street.

9. Bear left and enter at 2100 C Street, NW.

By Car from Dulles International Airport:

1. Take Airport Exit to Airport Access Road east.

2. Follow until Access Road merges with Rte 66 East.

3. Follow Rte 66 east into DC. Rte 66 becomes Rte 50 East / Constitution Avenue, NW.

4. Follow Route 66 to intersection of Constitution Avenue, NW and 23rd Street, NW.

5. Take left at third light following the intersection onto 21st NW.

6. Parking lot entrance is on left before stop

7. Walk north towards C Street.

8. Bear left and enter at 2100 C Street, NW.

Forum on Drug Discovery, Development, and TranslationForum on Neuroscience and Nervous System Disorders

Roundtable on Translating Genomic-Based Research for HealthNational Cancer Policy Forum

3

Getting to the National Academy of Sciences

Constitution Avenue, N.W., Washington, DC 20418

Directions to the National Academy of Sciences Building The National Academy of Sciences Building, located in the Foggy Bottom area of Washington,

D.C., is served by Ronald Reagan National Airport (DCA), Dulles International Airpo

Baltimore/Washington International Airport (BWI). It is accessible by Metro's Orang

By Car from Ronald Reagan National Airport:

1. Take Airport Exit to George Washington Memorial Parkway North.

2. Take exit to Memorial Bridge.

3. Bear left after crossing Memorial Bridge.

4. Take second left onto Henry Bacon Drive, NW. You must turn left at this point as your route will

5. Take right at light onto Constitution Avenue, NW.

6. Take left at second light onto 21st NW.

7. Parking lot entrance is on left before stop sign at intersection with C Street, NW.

8. Walk north towards C Street.

9. Bear left and enter at 2100 C Street, NW.

By Car from Dulles International Airport:

1. Take Airport Exit to Airport Access Road east.

2. Follow until Access Road merges with Rte 66 East.

DC. Rte 66 becomes Rte 50 East / Constitution Avenue, NW.

4. Follow Route 66 to intersection of Constitution Avenue, NW and 23rd Street, NW.

5. Take left at third light following the intersection onto 21st NW.

6. Parking lot entrance is on left before stop sign at intersection with C Street, NW.

7. Walk north towards C Street.

8. Bear left and enter at 2100 C Street, NW.

on Drug Discovery, Development, and Translation Forum on Neuroscience and Nervous System Disorders

Based Research for Health National Cancer Policy Forum

The National Academy of Sciences Building, located in the Foggy Bottom area of Washington,

Dulles International Airport (IAD) and

Orange and Blue

You must turn left at this point as your route will

t, NW.

DC. Rte 66 becomes Rte 50 East / Constitution Avenue, NW.

4. Follow Route 66 to intersection of Constitution Avenue, NW and 23rd Street, NW.

sign at intersection with C Street, NW.

Page 184: Sharing Clinical Research Data:  an IOM Workshop

Forum on Drug Discovery, Development, and Translation Forum on Neuroscience and Nervous System Disorders

Roundtable on Translating Genomic-Based Research for Health National Cancer Policy Forum

4

By Metro:

1. Take the Orange or Blue Line to Foggy Bottom-GWU metro stop.

2. Turn right when you exit the metro.

3. Walk south down 23rd Street, NW for approximately 7 blocks.

4. Turn left onto C Street, NW (after the State Department).

5. Cross 22nd Street.

6. The entrance is located at 2100 C Street and has a green awning.

By Taxi:

You may take a taxi to and from the meeting. There is a taxi stand at the hotel and taxis are

available at the National Academy of Sciences (NAS) by walking to Constitution Avenue which is

at the corner of the NAS building.

Page 185: Sharing Clinical Research Data:  an IOM Workshop