revolutionizing laboratory instrument data for the pharmaceutical industry: how semantic...

29
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry: How Semantic Technology is Helping Drive New Standards for Data Management Eric Little, PhD VP Data Science [email protected] Oliver Hesse Director Lab Automation & Data Mgmt . Bayer

Upload: osthus

Post on 14-Feb-2017

91 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Revolutionizing Laboratory

Instrument Data for the

Pharmaceutical Industry:

How Semantic Technology is Helping Drive New

Standards for Data Management

Eric Little, PhD

VP Data Science

[email protected]

Oliver Hesse

Director Lab Automation & Data Mgmt.

Bayer

Slide 2

The Current Situation in the Lab

Many challenges exist for data to be captured, integrated and shared

Data Silos

Incompatible instruments and software

systems, proprietary data formats

Legacy architectures are brittle and

rigid

SME knowledge resides in people’s

heads, little common vocabulary

Data schemas are not explicitly

understood

Lack of common vision between

business units and scientists

2

Slide 3

How do we change this situation? What did the music industry

teach us?

Data in Standard Format

Metadata in a Standard vocabularyRegulatory GuidanceMethodsRecipesSOPs…

Vendor-Specific Formats

ProcessMaterial

EquipmentResult

Slide 4

The Structure of Allotrope Is Unique

4

•Subject Matter Experts

•Project Funding

Member

Companies

•Project Management

•Legal & Logistical Support

Secretariat

•Framework Development

•Technical Leadership

Professional

Software Firm

•Requirements & Specifications

•Contributions, PoC Applications

Partner Network

Slide 5

The Structure of Allotrope Is Unique

5

•Subject Matter Experts

•Project Funding

Member

Companies

•Project Management

•Legal & Logistical Support

Secretariat

•Framework Development

•Technical Leadership

Professional

Software Firm

•Requirements & Specifications

•Contributions, PoC Applications

Partner Network

AbbVie

Amgen

Baxter

Bayer

Biogen

Boehringer Ingelheim

Bristol-Myers Squibb

Eli Lilly

Genentech/Roche

GlaxoSmithKline

Merck & Co.

Pfizer

Slide 6

The Structure of Allotrope Is Unique

6

•Subject Matter Experts

•Project Funding

Member

Companies

•Project Management

•Legal & Logistical Support

Secretariat

•Framework Development

•Technical Leadership

Professional

Software Firm

•Requirements & Specifications

•Contributions, PoC Applications

Partner Network

Abbot Informatics

ACD/Labs

Agilent

Biovia

Bruker

BSSN

Cytobank

EPAM

Fraunhofer IPA

Global Value Web

IDBS

LabAnswer

Labware

LEAP Technologies

Mestrelab Research

Mettler Toledo

PerkinElmer

Persistent Systems

Riffyn

Qualitest

Rondaxe

Sartorius

Sciex

Shimadzu

Synthace

TetraScience

Thermo Scientific

Transcriptic

Unchained Labs

Waters

Zifo

Erasmus Univ. Med

Center

J. Paul Getty Trust

(UK) Science and

Technology

Facilities Council

University of

Southampton

University of

Strathclyde

Stanford University

Slide 7

The Allotrope Framework

Allotrope Data Format (ADF)

Allotrope Data Models (ADM)

Allotrope Foundation Ontologies (AFO)

Slide 8

The Allotrope Framework

Allotrope Data Format (ADF)

Graph Instances

Allotrope Data Models (ADM)

Constraints

Allotrope Foundation Ontologies (AFO)

Classes and Properties

is populated by

is structured by

provide standardized

vocabulary for

Slide 9

Allotrope Taxonomies Domain Model [v.1.1.5]

Slide 10

Taxonomies Standardize Metadata Across Domains

Result

Process

Equipment

10

Slide 11

Codes

Terms

Vocabularies

TaxonomiesModels

Ontologies

Reasoning

SEMANTIC METHOD

Slide 12

Allotrope Data Format (ADF)

HDF5

Platform Independent File Format

Allotrope Data Format (ADF)

Descriptive metadata about

• Method, instrument, sample, process, result, etc.

• Provenance, audit trail

• Data Cube, Data Package

Analytical data represented by one- or

multidimensional arrays of homogeneous data

structures.

Data represented by arbitrary formats, incl. native

instrument formats, images, pdf, video, etc.

Specifically designed to store and organize large

amounts of scientific data.

Data Description

Semantic Graph Model

Data Cubes

Universal Data Container

Data Package

Virtual File System

APIs

(Java &

.N

ET class libraries)

Slide 13

Allotrope Data Format Example

Platform Independent File Format

Data Description

Data Cubes

Data Package

Request Sample Method Data & ResultsRun

Chromatogram 2D HDFChromatogram 2D

HDF

Chromatogram: 3DChromatogram: 2D

Slide 14

The Foundation for Data Integrity & Analytics

Plan

Analysis

Prepare

Samples

Submit

Samples

Control Inst.

Acquire

Data

Process

Data

Analyze

Data

Reports

Results

Store,

Archive

Data

RequestReport

Find &

Reuse

Sample

Prep Data

Instrument

Instructions

Instrument

Data

Processed

Data

Analyzed

Data

Reported

ResultsStored Data

Analytical

Method

Allotrope Foundation Ontologies (AFO)

Taxonomies

MaterialEquip-

mentProcess Result

Proper-

ties

StabilityBatch

ReleaseSolubility …

HPLC MS NMR …

Allotrope Data Models (ADM)

Stability

Study

Batch Rel.

Study

Solubility

Study…

HPLC-UV

Experiment

MS

Experiment

NMR

Experiment…

Slide 15

Solubility Testing Example *)

Instrument

Level

LIMS/ELN

Level

Solid

Dispense

Liquid

DispenseConditioning Centrifuge Filter Dilution

HPLC

Analysis

Raman

Analysis

xRPD

Analysis

pH

Analysis

LIMS / ELN

Allotrope Foundation Taxonomies

Dispense OntologyConditioning

Ontology

Centrifuge

Ontology

Filter

Ontology

Dilution

Ontology

HPLC

Ontology

Raman

Ontology

xRPD

Ontology

pH

Ontology

Solid

Dispense

Data Model

Liquid

Dispense

Data Model

Conditioning

Data Model

Centrifuge

Data Model

Filter

Data Model

Dilution

Data Model

HPLC

Data Model

Raman

Data Model

xRPD

Data Model

pH

Data Model

Solubility Study Data and Metadata

*) Extensions planned after the initial public release

Solubility Testing

Ontology

Solubility Testing

Data Model

Slide 16

Allotrope Provisional Roadmap

4Q 16 1Q 17 2Q 17 3Q 17 4Q 17

AD

M

ADM 1.0 – Initial Standardized Data Models + Certification + Governance

Scoping ADM 1.0 Delivered

ADM 1.0 Tested

Public release

extensions

AD

F

ADF 1.2 – Regulatory Compliance

ADF 1.2 Delivered

ADF 1.2 Tested

Scoping

ADF 1.3 – Structural Robustness

ADF 1.3 Delivered

ADF 1.3 Tested

Public release

maintenance

AFO

AFO 1.2 – Structural Robustness + Governance

Scoping AFO 1.2 Delivered

AFO 1.2 Tested

Public release

extensions

Design

Allotrope @ Bayer

Bayer • Company Profile 2016Slide 18

Full year sales: €46.3 billion**

115,176 employees*

307 subsidiaries

R&D expenses: €4.3 billion***As of December 31, 2015 (including Covestro) / Employees: as of September 30, 2016 (including Covestro)

* excluding Covestro: 99,517 employees (in full-time equivalents)

** excluding Covestro: €34.3 billion *** excluding Covestro: €4.0 billion

Strategic areas of interestLeveraging Benefits of the Allotrope Framework

Bayer – Allotrope @ SmartData 2017Page 19

Allotrope

Implementation

Strategy

Analytical Method

Management

Transfer Analytical Methods

Archiving

Reprocessable data, long

term readable format , Data

Integrity

Instrument Integration

Electronic Workflows /

ELN & LIMS

Taxonomies

as Reference /Master Data

Assets & Instrument

Management

Internet of Things = live inventory

Data Lake

Post-Analysis of data / Data-

mining

External Collaboration

CRO Integration / Data & Method

Exchange

Application Interfaces

LIMS Connectivity, e.g. to CDS

TaxonomiesReference & Master Data as the Basis

Bayer – Allotrope @ SmartData 2017Page 20

Interview

Research

LIMS/ELN

Publish

Review

Instrument TaxonomiesHPLC / U-HPLCHPLC-MSAmino assaysELISAHTRFElectrophoresisBioanalyzerCapillary Electrophoresis SDS-PAGE/Western BlotiCIEF / iCEqPCRSpectrophotometerFortebio Octet/Blitz

BiacoreMycoplasmaACLMultiplex fluorescent Immunoassay (Mfi)Microtiter plate readers Potency TestingChromogenic PotencyCell-based potencyDownstream ProcessTaxonomiesTangential Flow Filtration (TFF)Prep. Chromatography

Analytical Method ManagementFrom ‘Text’ to Machine Readable

Bayer – Allotrope @ SmartData 2017Page 21 Taken from: Weller HN, Nirschl DS, Paulson JL, Hoffman SL, Bullock WH., ACS Comb Sci. 2012,14(9),

520-526. doi: 10.1021/co300075g.

Material

Process

(method)

Properties

Device

Results

Analytical Method ManagementAs-is: Interrupted Process for Setting up Analytics

Bayer – Allotrope @ SmartData 2017Page 22

LIMS

MANUALLYAssigned

Analysis

TEXT-BASEDMethod Description

MANUALLYtranscribed

HPLC-MSWorkstation 1

HPLC-MSWorkstation 2

HPLC-MSWorkstation 3

Analytical Method ManagementOur Vision

Bayer – Allotrope @ SmartData 2017Page 23

INTERNAL

10010101101011010101010101011010110101010101001010110101101010100101010101

01101011010101010100101011010110101010101001010110101101010101010010101101

01101010010101010101101011010101010100101011010110101010101001010110101101

01010010101101011010101010100101011010110101001010101010110101101010101010

01010110101101010101010010101101011010101001010110101101010101010010101101

01101010010101010101101011010101010100101011010110101010101001010110101101

01010010101101011010101010100101011010110101001010101010110101101010101010

01010110101101010101010010101101011010101001010110101101010101010010101101

01101010010101010101101011010101010100101011010110101010101001010110101101

01010010101101011010101010100101011010110101001010101010110101101010101010

01010110101101010101010010101101011010101001010110101101010101010010101101

01101010010101010101101011010101010100101011010110101010101001010110101101

01010010101101011010101010100101011010110101001010101010110101101010101010

0101011010110101010101001010110101101010101010010101101011010

DATA LAKECompanies’ secret data, IP

Knowledge, Research results

LIMS

DELIVERS work related

methods/data/information

Current work in LIMS/

ELN/etc. triggers

AUTOMATED

SEARCH

Information

Broker

Companies‘

analytical

scientist

PUBLIC

RESEARCHPublished Data,

Published scientific

information, Journals,

Patents

analytical_results.

adf

Slide 24

Moving From Semantics to Data Science

Slide 25

What is Data Science?

At OSTHUS Data Science has a special meaning

Data Science is more than just statistical analysis

We combine math-based approaches (statistics) with logic-based approaches (semantics)

Conceptual + Computational

Semantics

Provides the vocabularies, definitions, class structures, logical relationships and

conceptual models

Statistics

Provide computations, trending, analysis, learning over time from the data itself

Slide 26

AT OSTHUS LAB DATA SCIENCE IS

B IG ANALYS IS

STA

TIS

TIC

AL

SE

MA

NT

ICS

MA

CH

INE

LE

AR

NIN

G

RE

AS

ON

ING

Slide 27

Machine Learning is Becoming Increasingly Valuable

Very little is known to be certain

in one’s data – abductive

reasoning is needed

Capture what you can

semantically

The rest can be gathered directly

from the data (bottom up)

Hypotheses can be driven from

SMEs and past patterns of

success

Often success of predictive

systems rely on testing the

models

The accuracy of the model can be

helped using semantics

The tests over time can show

problems of fit (alignment)

“Shelf life” Example:

I have data over 2 years – shows a shelf life of “x”

(I have some level of truth for this compound)

Now I take a similar compound “y”

What is its shelf life?

I can make a better guess based on previous

reasoning (induction)

I make a best guess for the shelf life of “y”

Test hypothesis on new data sets

Outcome:

1. Ability to understand and optimize in a shorter

period of time

2. Taxonomies and ontologies can help

understand the trend over time

Slide 28

Smart Data for Smart Labs in the 21st

Century

Smart labs in the future will

provide the enterprise with:

Integrated Data – common

reference data structures

(vocabularies)

Sharable Data – easier interaction

across teams and business units

Scalability – Big data applications

that can be highly elastic

Conceptual Representations –

context and perspective are

captured

Advanced Analytics – complex &

automated problem-solving

capabilities

Thank You? Questions?