translational data science_clean

Translational Data Science at

Merck

Chris L. Waller, Ph.D.

Executive Director and Head, Scientific Modeling Platforms…

Forward-Looking Statement

This presentation includes “forward-looking statements” within the meaning of the safe harbor provisions of the United

States Private Securities Litigation Reform Act of 1995. Such statements may include, but are not limited to, statements

about the benefits of the merger between Merck and Schering-Plough, including future financial and operating results, the

combined company’s plans, objectives, expectations and intentions and other statements that are not historical facts.

Such statements are based upon the current beliefs and expectations of Merck’s management and are subject to

significant risks and uncertainties. Actual results may differ from those set forth in the forward-looking statements.

The following factors, among others, could cause actual results to differ from those set forth in the forward-looking

statements: the possibility that all of the expected synergies from the merger of Merck and Schering-Plough will not be

realized, or will not be realized within the expected time period; the impact of pharmaceutical industry regulation and

health care legislation in the United States and internationally; Merck’s ability to accurately predict future market

conditions; dependence on the effectiveness of Merck’s patents and other protections for innovative products; and the

exposure to litigation and/or regulatory actions.

Merck undertakes no obligation to publicly update any forward-looking statement, whether as a result of new information,

future events or otherwise. Additional factors that could cause results to differ materially from those described in the

forward-looking statements can be found in Merck’s 2011 Annual Report on Form 10-K and the company’s other filings

with the Securities and Exchange Commission (SEC) available at the SEC’s Internet site (www.sec.gov).

Outline

• Merck & Co. (MSD) Introduction

• Function and Form: R&D (Merck Research Labs) and R&D IT (MRL IT)

• Translational Data Science, Informatics, and Analytics: Vision and Technology

• Real World Evidence: Opportunities to Use Outcomes to Influence Research and Development

• Discussion

But first, the news…

Cost to Develop and Win Marketing Approval

for a New Drug Is Increasing!

BOSTON – Nov. 18, 2014 – Developing a new prescription medicine that gains marketing approval, a process often lasting longer than a decade, is estimated to cost $2,558 million, according to a new study by the Tufts Center for the Study of Drug Development.

The $2,558 million figure per approved compound is based on estimated:

Average out-of-pocket cost of $1,395 million

Time costs (expected returns that investors forego while a drug is in development) of $1,163 million

Estimated average cost of post-approval R&D—studies to test new indications, new formulations, new dosage strengths and regimens, and to monitor safety and long-term side effects in patients required by the U.S. Food and Drug Administration as a condition of approval—of $312 million boosts the full product lifecycle cost per approved drug to $2,870 million. All figures are expressed in 2013 dollars.

The new analysis, which updates similar Tufts CSDD analyses, was developed from information provided by 10 pharmaceutical companies on 106 randomly selected drugs that were first tested in human subjects anywhere in the world from 1995 to 2007.

“Drug development remains a costly undertaking despite ongoing efforts across the full spectrum of pharmaceutical and biotech companies to rein in growing R&D costs,” said Joseph A. DiMasi, director of economic analysis at Tufts CSDD and principal investigator for the study.

He added, “Because the R&D process is marked by substantial technical risks, with expenditures incurred for many development projects that fail to result in a marketed product, our estimate links the costs of unsuccessful projects to those that are successful in obtaining marketing approval from regulatory authorities.”

In a study published in 2003, Tufts CSDD estimated the cost per approved new drug to be $802 million (in 2000 dollars) for drugs first tested in human subjects from 1983 to 1994, based on average out-of-pocket costs of $403 million and capital costs of $401 million.

The $802 million, equal to $1,044 million in 2013 dollars, indicates that the cost to develop and win marketing approval for a new drug has increased by 145% between the two study periods, or at a compound annual growth rate of 8.5%.

According to DiMasi, rising drug development costs have been driven mainly by increases in out-of-pocket costs for individual drugs and higher failure rates for drugs tested in human subjects.

Factors that likely have boosted out-of-pocket clinical costs include increased clinical trial complexity, larger clinical trial sizes, higher cost of inputs from the medical sector used for development, greater focus on targeting chronic and degenerative diseases, changes in protocol design to include efforts to gather health technology assessment information, and testing on comparator drugs to accommodate payer demands for comparative effectiveness data.

Lengthening development and approval times were not responsible for driving up development costs, according to DiMasi.

“In fact,” DiMasi said, “changes in the overall time profile for development and regulatory approval phases had a modest moderating effect on the increase in R&D costs. As a result, the time cost share of total cost declined from approximately 50% in previous studies to 45% for this study.”

The study was authored by DiMasi, Henry G. Grabowski of the Duke University Department of Economics, and Ronald W. Hansen at the Simon Business School at the University of Rochester.

Progressive, Unsustainable Decline in Productivity

Reported by Matthew Herper, Forbes 5/22/2014 “Who’s the best in drug research…”

http://www.forbes.com/sites/matthewherper/2014/05/22/new-report-ranks-22-drug-companies-based-on-rd/

2014 New Drug Approvals Hit 18-Year High

2014 was a good year for pharmaceutical

innovation – the best, in fact, since the

industry’s all-time record of 1996. FDA

approved a total of 44 drugs –

http://www.forbes.com/sites/bernardmunos/2015/01/02/the-fda-approvals-of-2014/

The productivity crisis in pharmaceutical R&D

Fabio Pammolli, Laura Magazzini & Massimo Riccaboni

Nature Reviews Drug Discovery 10, 428-438 (June 2011)

28,000 compounds from Pharmaceutical Industry Database

We are unable to predict success.

Failure Rates Increasing at all Stages of R&D

http://www.nature.com/nrd/journal/v10/n6/full/nrd3405.html

Merck & Co. (MSD)

$6.5 billion; 25 drug candidates in late-stage development; key areas: oncology, CV, diabetes, respiratory & immunology, neurology, infectious disease and vaccines

2014 R&D

EXPENSE

$42.2 billion; 61% of sales come from outside the United States

2014 REVENUES

Pharmaceuticals, Vaccines, Biologics and Animal Health

BUSINESSES

Kenilworth, New Jersey, U.S.A.HEADQUARTERS

Operating since 1851RICH HISTORY

We are known as Merck & Co. We are

known as MSD outside of the United

States and Canada.

WHO WE ARE

Approximately 70,000 worldwide

(as of 12/31/14)EMPLOYEES

Key Company

Facts

Premier Research-Driven BioPharmaceutical Company

Merck Research Labs

Form and Function

Translational Medicine Preclinical Development Clinical, Regulatory, & Safety Outcomes Research

Scientific Modeling Platform (Cross-functional Analytics & Predictive Modeling)

Scientific Information Management Platform (Cross-functional Information Access & Interoperability)

Business OutcomesDecrease SDV / GCD Cost Decrease Time to Market

Increase in Analysis of Real World Data

Ensure 100% Compliance

Increase Analytics Based Decision Making

Increase Biologics contribution to 40%

Increase use of modeling for trials and submissions

Scientists can find Information they need

Improve POC Success to 60%

Enterprise and Laboratory Platforms (Cross-functional Information Creation and Collection)

Applied Math and Modeling Team (Cross-functional Analytics & Predictive Modeling )

Translational Data Science, Informatics, and Analytics

Data Science

Data Science involves combining strong analytical skills with an exploratory mindset and

business domain expertise. Data scientists, or data science teams, can identify the right

questions, help get the right data, integrate, explore, visualize, interpret, find patterns, select the

right analytics approaches, and deliver business insights and impact. They generally operate on

the top half of the information pyramid, e.g. they depend on (lots of) available, interoperable,

data.

Informatics

Informatics is the activity of solving problems using data & information assets,

methodologies, and technologies. It also means navigating whatever parts of the

data-information-knowledge ecosystem are necessary to solve a problem. This

activity could require one or many different informatics-related disciplines, e.g.,

information management, software engineering, information system design,

bioinformatics, computational biology, mathematics, modeling, imaging, genomics,

network analysis, text mining, information flow modeling, scientific computing, health

informatics, statistics, cheminformatics, and it often requires a multidisciplinary team.

Analytics Continuum at Merck & Co.

JM Johnson, DRAFT 6/5/2014

Based on a similar slide from Booz Allen Hamilton

Analy

tical

com

ple

xity/d

epth

Descriptive

Analytics(hindsight)

Prescriptive

Analytics(foresight)

Predictive Modeling / Simulation /

Optimization

What will happen if ..? What’s the best

choice? What are the alternatives?

What should we do?

Statistical and Mathematical

Analysis

Is my hypothesis correct?

What is the cause?

Enquiry Analytics

Data Exploration & Mining

Analysis / Visualization /

Query / Drill down / Alerts

Hypothesis generation

What is the problem? Is there a

pattern? What is a good question to

ask? When is action needed?

Ad hoc and Custom

ReportsHow did it happen?

Standard Reports and

DashboardsWhat happened?

Predictive

Analytics(insight)

The “best” approach may be any of the above.

It depends on the problem and the context.

Merck’s Global Network

Press Release v1 (Merck BHAG Realized)

Merck’s revolutionary model-driven approach to drug development leads to breakthrough therapies in Oncology and Neuroscience.

Boston, MA, November 4, 2024

In the last 12 months Merck has released breakthrough treatments for cancer and mental health in record time by using it’s revolutionary modeling platform for human drug response.

By working with regulatory authorities world wide and leveraging public private partnerships, Merck has been able to develop deep models of human disease allowing them to go straight to human trials. This has allowed them to greatly reduce the traditional timeline for drug development and by-pass controversial and expensive animal trials.

Head of modeling Dr. Smith said that the approach was made possible by developing deep and accurate models of each individual in a clinical trial. “We actively recruited patient populations and made use of sophisticated bio-sensors, nanotechnologies and real-time analysis to develop comprehensive predictive models of their genetics, metabolism and disease”. Over a period of several years Merck modelers received constant streams of data from these volunteers giving them unprecedented understanding of their disease. They combined this with large publicly funded datasets and crowd sourced and internal modeling methods.

“We are moving to a new paradigm in drug discovery where we enroll patients before we start therapeutic development” said Smith.

Merck believes that it’s modeling platform and methodology can be used to rapidly develop cures for other diseases and is act ively seeking patients to donate their health information as well as development partners to license this platform in new disease areas.

Note: This is completely fake and does not represent any forward looking statements on behalf of Merck.

Press Release v2 (Merck BHAG Realized)

Merck’s “Virtual PipelineTM” Powers Decision Making

Boston, MA, November 4, 2024

Merck released details today on a revolutionary platform that it created to support all aspects of the drug discovery and development process.

This 10 year journey began in 2014 with the acknowledgement that the pharmaceutical industry must transform in order to survive the mounting financial and regulatory pressures.

In collaboration with regulatory agencies world-wide, Merck created the Virtual PipelineTM by adopting a Product Lifecycle Management (PLM) mentality and completely and permanently altered the pharmaceutical research and development landscape.

“The existence of the Virtual PipelineTM and the ability to fully simulate the entire lifecycles of therapeutic agents allowed our business development team to make an informed decision to acquire Iliad Pharmaceuticals’ entire portfolio with the intent to launch a drug that will see Merck re-enter the infectious disease therapeutic area. It is our expectation that Merck will enter the market with First and Best-in-Class agents grossing in excess of $10BN per annum.”, reported Dr. Hootie N.D. Blowfish, Head of Strategic Acquisitions.

While too early to verify, Merck projects that the Virtual PipelineTM will enable their research scientists to reduce the time from target identification to product launch by as much as 40% with associated cost savings nearing 50%.

Note: This is completely fake and does not represent any forward looking statements on behalf of Merck.

Questions, questions, questions…

Research Development Commercial Medical

Drug Protein Target ResponseSystem Individuals PopulationsPathway

What entity should I make?

How active is my entity?

What other activities does my entity possess?

How can I make it?

Do I have the starting materials?

What dose is required?Is it likely to be metabolized?

Is clearance going to be a problem? What is the most effective formulation?

How can I make it in bulk?

What disease should I target?

What targets are involved?

What mechanisms are involved?

How are my competitors doing?

Is my compound more effective than comparators?

How much can I charge for this?

Can I patent this?

TransformDeliver

AggregateAccess

Drug Protein Target Response

Answers, answers, answers…

System Individuals PopulationsPathway


Data(Internal and External,

Structured and

Unstructured)

Models and Simulations(Data)

Workflows (Best Practices)

Drug Protein Target

Response

interacts

with

and elicits a

The Promise of Predictive Modeling, Simulation,

and Optimization

distributes to

site of action

through a

in

System

IndividualsPopulations

Pathway

in a

within

that respond to

Each arrow represents an opportunity

to develop and utilize a predictive

model in lieu of more resource and

time-consuming experimentation!


Initial Efforts Focused on Intra-domain Optimization




Structured and

Unstructured)



Learning Loops (DMAIC Cycles) within the functional domains of Pharma R&D Support:

• Adaptive Research Operating Plans

• Adaptive Clinical Trials

• Behavioral Modification…

Design

Measure

Analyze

ImproveControl

Design

Measure

Analyze

ImproveControl

Design

Measure

Analyze

ImproveControl

Design

Measure

Analyze

ImproveControl

Model Usage is Growing…

Compounds registered as ‘GENERAL_SCREENING’ excluded from analysis

Resulting in Higher Quality Compounds!

Descriptor Function X1 X2 X3 X4

QSAR_CLint_rat_hepatocyte Decreasing 45 100

QSAR_CLint_human_hepatocyte Decreasing 25 60

QSAR_Clearance_rat Decreasing 15 35

ClogD_pH_7.4 Hump Function 1.5 23 3 3.5

Polar_Surface Hump Function 65 75 125 140

Molecular_Weight Hump Function 420 475 530 580

Courtesy: Kerim Babaoglu

Multiparameter Optimization (MPO) Analysis Drives Design of More Desirable Compounds

More Desirable Compounds Display Lower (Better) Human Dose Calculations

(Scaled from Experimental Rat PK Data)

Design/Synthesis Cycle

Desirabili

ty S

core

Legend:

Green = Good Dose

Yellow = Moderate Dose

Red = Poor Dose


Connecting the Domains with Models




Structured and

Unstructured)



Cross-domain DMAIC Loops…

Leads to Decreased Lead Optimization Cycle Times


Closing the Loop




Structured and

Unstructured)



Can we construct pan-R&D workflows that incorporate existing data, predictive models, and best practices

to drive design, predict full product lifecycle, and increase probability of success?

Real World Evidence and Outcomes Research

A Trillion Points of Data

https://youtu.be/ET8cxrEfLS4?t=955

https://youtu.be/ET8cxrEfLS4?t=955

31

The Hadoop Initiative: Supporting Today’s Data Access and Preparing for the Emergence of Big DataAbstract

Currently, Merck’s observational research activities rely heavily upon electronic medical record (EMR) and electronic administrative insurance claim (AIC) data, which are purchased from vendors and often stored in-house on the current Oracle® Exadata platform. This platform provides efficient storage and access to these types of electronic databases, which usually are organized as traditional structured relational database tables that may approach billions of observations in size.

However, as the rapid acceleration of worldwide electronic data generation continues, new sources of nontraditional data are expected to become increasingly relevant to pharmaceutical research. These new sources, collectively termed “Big Data,” are characterized as potentially massive and arriving in various formats, often unstructured – features that will render them increasingly less compatible with traditional computer architecture.

To prepare for future data demands while supporting our current data requirements, Merck’s Center for Observational and Real world Evidence (CORE) is evaluating Hadoop, an architecture and methodology designed to efficiently and inexpensively meet the storage, retrieval, and analysis requirements of Big Data’s immense and variably structured data. Currently under way is an assessment of Hadoop’s capability to meet Merck’s current EMR and AIC data processing requirements. Initial testing is proving favorable, and if the final phase of testing is also positive, Hadoop will offer a platform that will both meet today’s data requirements and offer a compatible, scalable architecture for accommodating tomorrow’s Big Data.

This poster summarizes the preliminary performance findings to date, highlights the benefits of Hadoop, describes the current production data platform vs the proposed Hadoop platform, highlights what is meant by Big Data, and touches upon the challenges ahead as we face the inevitable prospect of managing and analyzing Big Data.

Departments of 1Statistical Programming for CORE†, North Wales, PA; 2Applied Technology, Branchburg, NJ; 3CORE† Data Sciences & Insights,

North Wales, PA; 4Market Research & Analytics, North Wales, PA; 5IT

Client Services Leader, CORE†, Rahway, NJ; 6CORE† Data Sciences &

Insights, North Wales, PA, Merck & Co., Inc., Kenilworth, NJ, USA

Michael Senderak1; David Tabacco2;

Robert Lubwama3; David O’Connell4;

Matt Majer5; Bryan Mallitz6

Key Contributors: CORE† Data Sciences & Insights, PharmacoEpidemiology

& Database Research Unit, Global Human Health, Market Research

Analytics, CORE† Information Technology, Prague Global Innovation

Network, Applied Technology†Center for Observational and Real world Evidence

PO-13

Copyright © 2016 Merck Sharp & Dohme Corp., a subsidiary of Merck & Co., Inc. All rights reserved. Back Next

Administrative insurance claims (IC)

Electronic medical records (EMR)

Objective 1: Hadoop vs Oracle Exadata (continued)

Table 2. Preliminary performance

test results

Hadoop/SAS

LASR/HPA

Exadata

(Current Merck

Data Platform)

Data

Extraction†

6.15 sec vs →

4.0 sec vs →

4.9 min vs →

11.5 min vs →

4.8 min vs →

23.63 sec

14.96 sec

4.1 min

30 min

2 min

SAS Code

Processing‡

55 sec vs →

16 sec vs →

15 sec vs →

55 sec vs →

16 sec vs →

58 sec vs →

5 hr, 52 min, 21 sec

6 min, 25 sec

5 min, 47 sec

8 hr, 52 min

11 min

9 hr, 27 min

†To date, results range from ~300% extraction improvement using Hadoop to ~140% performance decline. Runs for additional test cases are in progress. Note that data were extracted to an SAS Institute server proximally located to the Hadoop hardware. The next phase of testing will involve extraction to the current, geographically distant Merck HP Unix server to determine exact performance metrics. Note also that the extraction advantage of Hadoop depends on extraction query construction, which will be further addressed in the next testing phase.

‡The power of parallel processing shows dramatic results, as the model is developed in memory across multiple nodes, as opposed to a single thread on disk.

Figure 1. Current Oracle Exadata/HP Unix platform

Figure 2. Proposed Hadoop/HP Unix platform

SAS environment on HP Unix

Oracle Exadata

Electronic medical records (EMR)

Administrative insurance claims (IC)

GE Centricity

CPRD

Cerner

THIN

Marketscan Medicare

Marketscan CCAE

and MDCR

OptumInsight

Data files

Analysis/informatics

users

Analysis datasets

Pool

subset

extract

Analysis

cohorts

SAS/R

analysis

SQL querySAS/R/SQL

SAS/R analysis

HUMANA

• Exadata is used primarily for storage

• Data are extracted to Unix as SAS datasets for processing

Statistical

programming

EMR and

IC vendor data

GE Centricity

CPRD

Cerner

THIN

Big

Data

Big Data

analytics

In subsequent releases, SAS processing currently with the HP Unix platform may be migrated into the Hadoop architecture.

Hadoop:

• Stores EMR and IC data for extraction

to Unix

• Both stores and processes Big Data

Compressed ASCII files

received from vendors

SAS environment on HP Unix

Analysis/informatics

users

Analysis datasets

Pool

subset

extract

Analysis

cohorts

SAS/R

analysis

SQL querySAS/R/SQL

SAS/R analysis

Statistical

programming

EMR and

IC vendor data

Compressed ASCII files

received from vendors

Residing on or off site

Marketscan Medicare

Marketscan CCAE

and MDCR

OptumInsight

HUMANA

Data files

Analysis/

informatics

users

Big Data:

Genomic data, streaming data, etc

Statistical programming

The Hadoop Initiative: Supporting Today’s Data Access and Preparing for the Emergence of Big Data PO-13

32Copyright © 2016 Merck Sharp & Dohme Corp., a subsidiary of Merck & Co., Inc. All rights reserved. Back Next

Marketplace monitoring

Product launchand

marketplace

ClinicalPhase

Preclinicalphase

Discoveryphase

The exponential explosion of data generated daily is just in its early stages (Philadelphia Big Data Conference, 2015).

• During the single year of 2012, nearly 500 times the amount of data were generated than since the dawn of mankind

• Through 2013 to 2020, nearly 17,000 times the amount of data will have been generated than since the dawn of mankind

• By 2020, information will double every 73 days

Big Data defined

Table 3. Generally agreed upon-definition of Big Data

Objective 2: Preparing for Big Data

Volume Data too large for standard database management tools

Velocity Delivered at incredibly fast rates, often real time, not always predictable timing

VariabilityArrives in various formats, often unstructured (as opposed to relational or standard row-column format)

Versatility Various sources and types of data

Table 4. The data lake

Big Data is:

• Potentially

massive

• Without an

assumed

single structure

• Not data

tables, but a

fluid data lake

of possibly

dissimilar data

sources

Traditional storage and retrieval

systems are not designed for Big Data.

Instead, with Big Data:

• Data prep, cleansing, linking happen just-

in-time

• No up-front extraction, transformation,

and loading into a structured environment

• Linking across sources and entities can

happen flexibly and incrementally

• Linkages across disparate data sources

are customized to the business need at

hand

• Linkage solutions are developed only as

needed, minimizing resource needs

Hadoop’s distributed processing

across multiple parallel

computing paths efficiently

manages the massive storage

and computational requirements

of unstructured data sources

• Example: Full prostate tumor

genetic sequences (exomes)

(Roche, 2014)

− 15 seconds to search

4,002,926,334 rows of exome

variants and join with

14,787,223 rows of expression

data

Big Data and the pharmaceutical industry

Table 4. Sources of Big Data for pharma

Regional, National, and Worldwide Databases

Patient histories Patient registries

Electronic medical records

Medical insurance claims

Prescription claims Lab data Imaging data Genomic data

Physician office data and freehand notes

Wearable medical device data

Government records

Market/sales data

Web-based data: News feeds Social media streams: blogs, patient experience forums, etc.

Figure 3. Big Data synergies across the product life cycle

Adapted from: Defay T. and Mehta V. 2014.

The Hadoop Initiative: Supporting Today’s Data Access and Preparing for the Emergence of Big Data PO-13

33Copyright © 2016 Merck Sharp & Dohme Corp., a subsidiary of Merck & Co., Inc. All rights reserved. Back Next

Prediscovery:patient

targeting

Genetic and genomic: Gene sequencing, immunology databases, etc

Patient-centered: Government, academic, commercial data, clinical trials, biometrics, etc

Smart devices and censors: Telephonic, wireless, etc. Patient monitoring

Interactive media: Patient self-help and sharing forums, doctor forums, blogs, etc

Healthcare information networks: Patient/physician resources, guidelines,

policies, etc

Market data: Physician office records and insurance claims for rx, dx, labs, etc

Preclinical development phaseClinical development, distribution,

and postmarketing

Big

Data

so

urc

es

Predictive and Economic Modeling

• Global Burden of Disease

• Budget Impact

• Launch Optimization

Consortia and Other Considerations

• TransCelerate – working now on an eSource program focused on harmonizing the direct capture of clinical study data from HER/EMR, wireless/remote patient data, and virtual trials.

• FDA Sentinel Initiative – patient safety data collected by entities contracted by FDA. Does access to near real-time real-world data change the safety landscape on any way?

We are able to predict success.

The Vision: Failure Rates Decreasing at All Stages of R&D

0

10

20

30

40

50

60

70

80

90

100

15 25

0

10

20

30

40

50

60

70

80

90

100

15 25

0

10

20

30

40

50

60

70

80

90

100

15 20 25 30

0

10

20

30

40

50

60

70

80

90

100

15 20 25 30

0

10

20

30

40

50

60

70

80

90

100

15 20 25 30

Thank you!

translational data science_clean

Documents