research data analytics at thomas jefferson university jack london, phd thomas jefferson university...

29
Research Data Analytics at Thomas Jefferson University Jack London, PhD Thomas Jefferson University Sidney Kimmel Cancer Center Philadelphia PA USA 2015 i2b2 European Academic User Group meeting October 6, 2015

Upload: millicent-stafford

Post on 17-Jan-2016

228 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Research Data Analytics at Thomas Jefferson University Jack London, PhD Thomas Jefferson University Sidney Kimmel Cancer Center Philadelphia PA USA 2015

Research Data Analytics at Thomas Jefferson University

Jack London, PhDThomas Jefferson University

Sidney Kimmel Cancer CenterPhiladelphia PA USA

2015 i2b2 European Academic User Group meeting

October 6, 2015

Page 2: Research Data Analytics at Thomas Jefferson University Jack London, PhD Thomas Jefferson University Sidney Kimmel Cancer Center Philadelphia PA USA 2015

2

Disclaimer

In addition to my faculty position at Thomas Jefferson University in Philadelphia, I am a consultant for TriNetX Corporation.

Page 3: Research Data Analytics at Thomas Jefferson University Jack London, PhD Thomas Jefferson University Sidney Kimmel Cancer Center Philadelphia PA USA 2015

3

Thomas Jefferson University and the Sidney Kimmel Cancer Center (SKCC), Philadelphia

Located between New York City and Washington DC

Jefferson Medical College (JMC) was founded in 1824.

JMC is the second largest private medical school in the U.S.

The NCI-designatedSKCC has ~ 400 physicians and scientists

dedicated to discovery and development of novel approaches for cancer

treatment.

Page 4: Research Data Analytics at Thomas Jefferson University Jack London, PhD Thomas Jefferson University Sidney Kimmel Cancer Center Philadelphia PA USA 2015

4

SKCC’s IT infrastructureGE Centricity inpatient EMR

Allscripts outpatient (ambulatory care) EHR

EPIC inpatient and outpatient

Cerner A/P lab system EPIC Beaker

OpenSpecimen research biobank management

TIES clinical text extraction

i2b2 research data mart

TriNetX data analytics network

Page 5: Research Data Analytics at Thomas Jefferson University Jack London, PhD Thomas Jefferson University Sidney Kimmel Cancer Center Philadelphia PA USA 2015

5

Current Jefferson Data Resource Landscape

TJUH CLINICAL DATA WAREHOUSE

DEMOGRAPHICS (gender, race, age, vital status, ethnicity)

DIAGNOSES (ICD9)

PROCEDURES (ICD9)

CLINICAL LABS (LOINC)

MEDICATIONS

TJUH CLINICAL DATA WAREHOUSE

DEMOGRAPHICS (gender, race, age, vital status, ethnicity)

DIAGNOSES (ICD9)

PROCEDURES (ICD9)

CLINICAL LABS (LOINC)

MEDICATIONS

i2b2 RESEARCH DATA MARTi2b2 RESEARCH DATA MART

IMPAC METRIQcancer registry site, stage, histology,

treatment, survival (ICD-O-3)

IMPAC METRIQcancer registry site, stage, histology,

treatment, survival (ICD-O-3)

CERNER A/P“omic” data

CERNER A/P“omic” data

FORTE ONCOREclinical trial data

FORTE ONCOREclinical trial data

OPEN SPECIMENbiospecimen annotation (SNOMED)

OPEN SPECIMENbiospecimen annotation (SNOMED)

Page 6: Research Data Analytics at Thomas Jefferson University Jack London, PhD Thomas Jefferson University Sidney Kimmel Cancer Center Philadelphia PA USA 2015

6

Jefferson’s i2b2 Research Data Mart

• Built on “informatics for integrating biology and the bedside” (i2b2) version 1.7.02

• RDM data are de-identified. Re-identification possible via an honest broker, who has access to a re-identification application.

• Currently > 45 million observations on > 450,000 patients. Data refreshed weekly.

Page 7: Research Data Analytics at Thomas Jefferson University Jack London, PhD Thomas Jefferson University Sidney Kimmel Cancer Center Philadelphia PA USA 2015

7

Patient data obtained from TJUH EMRDEMOGRAPHICS

Age

Ethnicity

Gender

Race

Vital Status (alive/dead)

DIAGNOSES

Disease systems --> diseases (organized by ICD9 coding)

CLINICAL LAB RESULTS

Chemistry

Coagulation

Hematology

MEDICATIONS

Anti-neoplastic

INPATIENT PROCEDURES

Diagnostic and Treatment procedures (organized by ICD9 coding)

Page 8: Research Data Analytics at Thomas Jefferson University Jack London, PhD Thomas Jefferson University Sidney Kimmel Cancer Center Philadelphia PA USA 2015

8

Patient mutation data obtained from Pathology Molecular Diagnostic Testing (both outsourced and in-house)

ALK rearrangement

BRAF c.1782T>G p.D594E

BRAF c.1801A>G p.K601E

BRAF c.1799T>A p.V600E

EGFR Deletion in exon 19

EGFR Insertion in exon 20

EGFR c.2236G>A p.E746K

EGFR c.2236_2250del15

p.E746_A750delELREA

EGFR c.2156G>C p.G719A

EGFR c.2155G>T p.G719C

EGFR c.2155G>A p.G719S

EGFR c.2573T>G p.L858R

EGFR c.2582T>A p.L861Q

EGFR c.2303G>T p.S768I

JAK2 c.1849G>T p.V617F

JAK3 c.2164G>A p.V722I

KRAS c.35G>C p.G12AKRAS c.34G>T p.G12CKRAS c.35G>A p.G12DKRAS c.34G>C p.G12RKRAS c.34G>A p.G12SKRAS c.35G>T p.G12VKRAS c.38G>A p.G13D

NRAS c.183A>T p.Q61HNRAS c.181C>A p.Q61KNRAS c.182A>T p.Q61LNRAS c.182A>G p.Q61R

PIK3CA c.1633G>A p.E545KPIK3CA c.3140A>T p.H1047LPIK3CA c.3140A>G p.H1047R

PTEN c.754G>T p.D252YPTEN c.59G>A p.G20E

RET rearrangement

ROS1 rearrangement

SMAD4 c.1157G>A p.G386D

TP53 c.843C>A p.D281ETP53 c.811G>T p.E271*TP53 c.857A>C p.E286ATP53 c.400T>C p.F134LTP53 c.734G>A p.G245DTP53 c.388C>G p.L130VTP53 c.524G>A p.R175HTP53 c.817C>T p.R273CTP53 c.818G>A p.R273HTP53 c.318C>G p.S106RTP53 c.659A>G p.Y220CTP53 c.707A>G p.Y236C

Page 9: Research Data Analytics at Thomas Jefferson University Jack London, PhD Thomas Jefferson University Sidney Kimmel Cancer Center Philadelphia PA USA 2015

9

Molecular Diagnostics ontology

Page 10: Research Data Analytics at Thomas Jefferson University Jack London, PhD Thomas Jefferson University Sidney Kimmel Cancer Center Philadelphia PA USA 2015

10

Specimen annotation from campus biobanks

Anatomic origin (SNOMED)

Class (tissue, fluid)

Type (frozen, FFPE)

Pathology (normal, malignant, diseased)

Slide images

Eight biobanks, including the TJUH paraffin block archive of ~400,000 cases since 1990.

Page 11: Research Data Analytics at Thomas Jefferson University Jack London, PhD Thomas Jefferson University Sidney Kimmel Cancer Center Philadelphia PA USA 2015

11

Specimen annotation management

TJUH clinical paraffin block archive

Pathology Department research tissue bank

Brain tumor bank(J. Evans, PI)

Pancreatic tumor bank(C. Yeo, PI)

Breast tumor bank(J. Palazzo, PI)

Thyroid tumor bank(E. Pribitkin, PI)

Brain tumor bank(D. Andrews, PI)

Liver tumor bank(V. Navarro, PI)

JJJjj

jj

Jefferson integrated Research Specimen

management (OpenSpecimen)

> 230,000 patients > 650,000 specimens

> 100,000 patients

via i2b2 RDM

Cancer patients having comprehensive annotation from the Tumor Registry and banked

specimens

Page 12: Research Data Analytics at Thomas Jefferson University Jack London, PhD Thomas Jefferson University Sidney Kimmel Cancer Center Philadelphia PA USA 2015

12

Biospecimen ontology

Page 13: Research Data Analytics at Thomas Jefferson University Jack London, PhD Thomas Jefferson University Sidney Kimmel Cancer Center Philadelphia PA USA 2015

13

Pathology images are available via i2b2 query tool

Page 14: Research Data Analytics at Thomas Jefferson University Jack London, PhD Thomas Jefferson University Sidney Kimmel Cancer Center Philadelphia PA USA 2015

14

Patient data from Jefferson Tumor Registry

Primary Cancer Diagnosis

Age at diagnosis/date of diagnosis

Survival (months) from diagnosis

Tumor histology and behavior

Stage (AJCC/TNM, clinical and pathological)

Grade

Recurrence

local, distant

Treatment

chemotherapy, radiation, surgery, transplant, palliative

Disease-specific factors

ex: (prostate --> Gleason score)

Over 100,000 cases since 1990.

Page 15: Research Data Analytics at Thomas Jefferson University Jack London, PhD Thomas Jefferson University Sidney Kimmel Cancer Center Philadelphia PA USA 2015

15

Tumor Registry ontology

Page 16: Research Data Analytics at Thomas Jefferson University Jack London, PhD Thomas Jefferson University Sidney Kimmel Cancer Center Philadelphia PA USA 2015

16

Typical SKCC Investigator Queries

Example #1:

Form cohort of “triple negative” (estrogen receptor, progesterone receptor, and her2 negative), African American patients, having matched normal and malignant frozen tissue specimens.

Example #2:

Form cohort of patients with a primary diagnosis of papillary thyroid cancer, and expressing a V600E BRAF mutation.

Page 17: Research Data Analytics at Thomas Jefferson University Jack London, PhD Thomas Jefferson University Sidney Kimmel Cancer Center Philadelphia PA USA 2015

17

Additional data on selected cohort can be retieved

Page 18: Research Data Analytics at Thomas Jefferson University Jack London, PhD Thomas Jefferson University Sidney Kimmel Cancer Center Philadelphia PA USA 2015

18

Example data summaries from the i2b2 RDMCLINICAL DIAGNOSES OF TJUH PATIENTS WITH THYROID SPECIMENS

Page 19: Research Data Analytics at Thomas Jefferson University Jack London, PhD Thomas Jefferson University Sidney Kimmel Cancer Center Philadelphia PA USA 2015

19

Jefferson – TriNetX project

In the fall of 2014, the SKCC informatics group entered into a collaboration with a Cambridge, Massachusetts based start-up company, TriNetX, Inc.

TriNetX facilitates collaboration between pharmaceutical companies and academic healthcare providers through the creation of a global, federated data network that connects academic and industry clinical researchers in real-time to the patient populations they are attempting to study.

The TriNetX applications accesses a site’s i2b2 database, and displays aggregate query results in an advanced, flexible manner.

Page 20: Research Data Analytics at Thomas Jefferson University Jack London, PhD Thomas Jefferson University Sidney Kimmel Cancer Center Philadelphia PA USA 2015

20

TriNetX application offers an alternative query tool with enhanced data visualization

Google-like query interface

Graphic result display

Page 21: Research Data Analytics at Thomas Jefferson University Jack London, PhD Thomas Jefferson University Sidney Kimmel Cancer Center Philadelphia PA USA 2015

21

TriNetX application offers an alternative query toolwith enhanced data visualization

Interactive display capability

Page 22: Research Data Analytics at Thomas Jefferson University Jack London, PhD Thomas Jefferson University Sidney Kimmel Cancer Center Philadelphia PA USA 2015

22

Cohort definition via i2b2 can be used to predict accrual for proposed clinical trials

Page 23: Research Data Analytics at Thomas Jefferson University Jack London, PhD Thomas Jefferson University Sidney Kimmel Cancer Center Philadelphia PA USA 2015

23

Problem confronting clinical trials research: studies that fail to accrue

An Institute of Medicine report1 on cancer cooperative group trials found that 40% were never completed because of failure to achieve minimum accrual goals:

“The ultimate inefficiency is a clinical trial that is never completed because of insufficient patient accrual, and this happens far too often.”

These non-accruing trials are often kept open for many months before closure, consuming personnel resources in their setup and operation at a significant cost to institutions, without providing any return in definitive research findings.

Furthermore, while many of these trials register zero patients, others accrue some patients, resulting in thousands of patients nationwide who are recruited to unproductive research studies.2

1. Nass SJ, Moses HL, Mendelsohn J, editors. Committee on Cancer Clinical Trials and the NCI Cooperative Group Program Board on Health Care Services; A National Cancer Clinical Trials System for the 21st Century: Reinvigorating the NCI Cooperative Group Program. Washington DC: National Academies Press, 2010.

2. Cheng, S., M. Dietrich, S. Finnigan, A. Sandler, J. Crites, L. Ferranti, A. Wu, and D. Dilts. A sense of urgency: Evaluating the link between clinical trial development time and theaccrual performance of CTEP-sponsored studies. 2009 ASCO Annual Meeting Proceedings. J of Clinical Oncology, 2009.

Page 24: Research Data Analytics at Thomas Jefferson University Jack London, PhD Thomas Jefferson University Sidney Kimmel Cancer Center Philadelphia PA USA 2015

24

Study design

The overall objective of this study was to evaluate whether accrual for proposed cancer clinical trials could be predicted by performing cohort queries that are based on the trial’s eligibility criteria on recent patient data in Jefferson’s i2b2 research data mart (RDM), created from de-identified integrated hospital clinical, tumor registry, and specimen data.

To determine the ability of the i2b2 RDM to predict accrual for prospective trials, we retrospectively used the RDM to obtain patient populations for two years prior to recent trials and compared these cohort sizes to the actual accrual observed after the trial was opened. We considered 90 interventional cancer trials opened at KCC in the years 2008, 2009, and 2010, since these have been open for at least two years and their accrual performance could be evaluated.

Page 25: Research Data Analytics at Thomas Jefferson University Jack London, PhD Thomas Jefferson University Sidney Kimmel Cancer Center Philadelphia PA USA 2015

25

Study methodology

o We constructed RDM cohort queries corresponding to the trial eligibility criteria for the two years prior to each trial’s opening (e.g., we considered TJUH patient populations from 2007 and 2008 for trials opened in 2009).

o We computed an annual cohort size by averaging the 2-year totals.

o We then compared our RDM annual cohort size for the 2 years preceding a trial’s opening to the annual target goal for that trial and the trial’s actual accrual performance.

• Since we initially assumed that 50% of eligible participants would enroll in a study, the RDM cohort would have to be at least twice the accrual goal for a prediction of “successful” trial accrual.

• We defined a trial’s actual accrual performance as “successful” if it accrued at least 80% of its target enrollment.

Page 26: Research Data Analytics at Thomas Jefferson University Jack London, PhD Thomas Jefferson University Sidney Kimmel Cancer Center Philadelphia PA USA 2015

26

Results

To assess the predictive precision of our proposed project, a contingency table was produced for the 90 trials analyzed.

•A trial was denoted as potentially successful in meeting its annual target accrual (“PREDICTED SUCCESS” row) if the retrospective i2b2 cohort analysis indicated sufficient patients for the trial.

•A trial was denoted as actually successful in meeting its annual target accrual if the trial satisfactorily approached the protocol’s stated target annual accrual (“ACTUAL SUCCESS” column).

Contingency table comparing i2b2 accrual predictions with actual accrual success, assuming only 50% of potential participants identified by i2b2 are enrolled.

Our methodology has 0.969 (= 31/32 trials) accuracy (95% C.I. (0.908, 1)) for predicting successful accrual (i.e. specificity) and 0.397(= 23/58 trials) accuracy (95% C.I. (0.271, 0.522)) for predicting failed accrual (i.e. sensitivity). The positive predictive value, or precision rate, is 0.958 (= 23/24 trials) (95% C.I. (0.878, 1)).

Page 27: Research Data Analytics at Thomas Jefferson University Jack London, PhD Thomas Jefferson University Sidney Kimmel Cancer Center Philadelphia PA USA 2015

27

Results

Our results show that the methodology, while having an excellent positive predictive value (95.8%, predicted failure for 23 of the 24 trials that actually failed ), is not good at predicting failed accrual (39.7%, 23/58 trials).

In other words:

if the methodology predicts "failed accrual," then we should trust this prediction and should not proceed to open the trial with its current eligibility criteria;

however, a prediction of accrual success using this method is no guarantee that target goals will be met.

Page 28: Research Data Analytics at Thomas Jefferson University Jack London, PhD Thomas Jefferson University Sidney Kimmel Cancer Center Philadelphia PA USA 2015

28

How can this methodology be useful?

A benefit of analyzing potential trial accrual during the protocol design phase is that it offers an opportunity to “tweak” eligibility rules when insufficient patient cohorts are found.

A change in participation criteria that does not impact significantly on the scientific objectives of the trial may provide a sufficiently large potential patient pool.

Not opening the 23 trials that were correctly predicted to fail to accrue over the 3 years studied would have prevented the waste of about $200,000 in trial startup costs alone, and the participation of 57 patients in studies which did not contribute to advancing science or clinical care.

Page 29: Research Data Analytics at Thomas Jefferson University Jack London, PhD Thomas Jefferson University Sidney Kimmel Cancer Center Philadelphia PA USA 2015

Selected areas of research using RDM: Hallgeir Rui, MD, PhD: Molecular Cancer Epidemiology, cancer pharmacogenetics, individualised cancer risk assessment and prognostication.

Raphael E. Bonita, MD: Jefferson Heart Institute, correlation of troponin levels and heart failure in transplant patients.

Hushan Yang, PhD: Molecular Cancer Epidemiology.

Jordan Winter, MD: Surgery, whipple procedure survival study.

Scott Waldman, MD, PhD: Pharmacology and experimental therapeutics.

Ron Myers, PhD: Gene environmental risk assessmant.

Stephen Peiper, MD: Biomarker discovery using Next Generation Sequencing.