linked data for biopharma

Post on 07-May-2015

2.384 Views

Category:

Health & Medicine

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

As BioPharma adapts to incorporate nimble networks of suppliers, collaborators, and regulators the ability to link data is critical for dynamic interoperability. Adoption of linked data paradigm allows BioPharma to focus on core business: delivering valuable therapeutics in a timely manner.

TRANSCRIPT

Tom Plasterer, PhD.integrated informatics Semantic Framework Lead (i2SF)

The Path to Linked Data in BioPharma

Integrated R&D Informatics and Knowledge Management

R&D | RDI

Blockbuster ‘Patent Cliff’ Gives Way to Personalized ApproachDrivers & Solutions

Blockbuster Patent Cliff

Growth of Generics

Mergers & Acquisitions

Personalized Medicine•Pharmacogenetics•Biomarkers

American Action Forum; Primer: The Pharmaceutical Industry (Han Zhong l Updated June 2012)

IMAP Pharma & Biotech Industry Global Report 2011

Evaluate Pharma World Preview 2018From: http://www.liv.ac.uk/pharmacogenetics/

R&D | RDI

•Nurture ‘best in class’ programs

•Kill early•Repositi

oning

Build from within

•Partner or Buy?

•Integrate cultures & technology

•Is the disruption worth it?

Mergers & Acquisitions

•How much can be shared—and still be useful?

•Who is driving?

Pre-Competitive Consortiums

•Aggressive Regional Partnerships (Pfizer's Centers for Therapeutic Innovation)

•Co-locate near Academic Centers of Excellence (Novartis)

•Cherry pick (GSK, AZ, others)

Finding ‘KOLs’

Where do the new opportunities arise?Inside & Outside

R&D | RDI

Distributed Data in a Monolithic EnvironmentManaging Silos

• Regulated Systems vs. DiscoveryPartitioned By Content

• US, EU, ASIAPACPartitioned By Geography & Organization

• RDB, Excel, Text, RSS, RDF?Data Formats

• Steps in the right direction?Warehouses & Service Oriented Architecture

• eRooms, Sharepoint,Yammer, ‘Lync’ vs. Twitter, Google Docs, SkypeCollaborative Environment

• Vendor specific or open?• Mixed BagStandards?

• UI? Services?• Metadata?Where are the ‘smarts’

R&D | RDI

Requirements of The Informatics Landscape

Must span the entire drug development lifecycleo and back (post-market surveillance to discovery)

Must support large and very heterogeneous datao single nucleotide polymorphisms to countries

Will change as new science emerges & new regulations come into playo Medline just under 1M articles/year

Must be able to work with multiple, international regulatory bodieso Emerging markets

Partners, customers and collaborators will changeo and will have divergent technical aptitudes

Must be able to interoperated with precompetitive consortiao Can they perform common tasks for the community

Must be able to work with legacy datao Lots of unmined gems here!

Maximal Agility

R&D | RDI

What’s Needed?

Linked Data!

http://thedatahub.org/group/lodcloudLOD Cloud 2011

R&D | RDI

The 5 Stars of Open Linked Data

W3C/TBL Guidance

7 http://www.w3.org/DesignIssues/LinkedData.html

★ Make your stuff available on the web (any format)

★★ make it available as structured data (e.g. Excel instead of image scan of a table)

★★★ Use a non-proprietary format (e.g. CSV instead of Excel)

★★★★ Use URLs to identify things, so that people can point at your stuff

★★★★★ Link your data to other people’s data to provide context

R&D | RDI

The 5 Stars of Open ClosedLinked Data

8 http://www.w3.org/DesignIssues/LinkedData.html

★ Make your stuff available on the web intranet (any format)

★★ make it available as structured data (e.g. Excel instead of image scan of a table)

★★★ Use a non-proprietary format (e.g. CSV instead of Excel)

★★★★ Use URLs to identify things, so that people can point at your stuff

★★★★★ Link your data to other people’s data to provide context

W3C/TBL Guidance

Catalogues, Mapping, Queries

RD

F

Towards a Linked Data Architecture

9

Active & Partial PURLs

Central IdentityManagement

Structured

Triplestores

http://research.vocab.astrazeneca.com/id/DOID/2841 http://humandiseaseontology.astrazeneca.net/DOID/2841

SemanticVisualization

Semi-StructuredUnstructured

Content

+Tagging

VocabularyServer

Search

R&D | RDI

Choosing Linked VocabulariesCurrent LOD Cloud Adoption

10

Vocabulary prefix Vocabulary link

Number of usages in data

sets

dc http://purl.org/dc/elements/1.1/ 92 (31.19 %)

foaf http://xmlns.com/foaf/0.1/ 81 (27.46 %)

skos http://www.w3.org/2004/02/skos/core# 58 (19.66 %)

geo http://www.w3.org/2003/01/geo/wgs84_pos# 25 (8.47 %)

xhtml http://www.w3.org/1999/xhtml/vocab# 19 (6.44 %)

akt http://www.aktors.org/ontology/portal# 17 (5.76 %)

bibo http://purl.org/ontology/bibo/ 14 (4.75 %)

mo http://purl.org/ontology/mo/ 13 (4.41 %)

vcard http://www.w3.org/2006/vcard/ns# 10 (3.39 %)

sioc http://rdfs.org/sioc/ns# 10 (3.39 %)

cc http://creativecommons.org/ns# 8 (2.71 %)

geonames http://www.geonames.org/ontology# 6 (2.03 %)

http://www4.wiwiss.fu-berlin.de/lodcloud/state/#terms

VocabularyServer

R&D | RDI

The 5 Stars of Open Linked Vocabularies

Bernard Vatant (Mondeca) Guidance

11 http://blog.hubjects.com/2012/02/is-your-linked-data-vocabulary-5-star_9588.html

★ Publish your vocabulary on the Web at a stable URI

★★ Provide human-readable documentation and basic metadata (e.g. creator, publisher, date of creation, last modification, version number)

★★★ Provide labels and descriptions, if possible in several languages, to make your vocabulary usable in multiple linguistic scopes

★★★★ Make your vocabulary available via its namespace URI, both as a formal file and human-readable documentation, using content negotiation

★★★★★ Link to other vocabularies by re-using elements rather than re-inventing

R&D | RDI

Domain Specific Vocabularies

Linked Open Vocabularies, NCBO

12

http://labs.mondeca.com/dataset/lov/index.html

http://bioportal.bioontology.org/

Capture Business Questions and

Sources

Domain Expert Concept Map

Build Formal Ontology•Reuse Vocabularies!

Challenge with Linked Data

Model Business Questions (SPARQL)

Interact with RDF answer in a

Faceted Browser

Building Linked Data Applications

Improving Internal Interoperability

Scientists, Clinicians, Informaticists can now freely interoperate as:

The PURL server provides a central identity management authority for resources that are of value (need to persist) across the enterprise. The Persistent URLs are used to connect resources found in multiple locations

The vocabulary server provides a way of harmonizing concepts across different domains

o Where possible, public vocabularies are usedo Where not, they’re extendedo We don’t want to develop and maintain vocabularies

R&D | RDI

Structured

Vendor Content

Consortium ContentRESTful

APIs

Catalogues, Mapping, Queries

RD

F

Structured

Triplestores

Semi-StructuredUnstructured

Content

+Tagging

Inside/Outside Disappears

15

External Internal

Active & Partial PURLs

Central IdentityManagement

SemanticVisualization

VocabularyServer

R&D | RDI

Unstructured Content

Giving Structure to Unstructured ContentoEntity RecognitionoUse of common vocabularies

o Schemaso Domain-Specific Content? Open BEL? TMO?

oCompatibility of text indices with triplestores & middleware tools

Encouraging Publishers to Structure ContentoHow can this be ‘monetized’ so they don’t lose their ROI?oWhat about interoperability & persistence?oCan this be mandated via funding agenciesoRDFa to start?

Publishers or ‘Re-publishers’o Thomson-Reuterso IngenuityoOpen up vocabularies

(or most of the data out there…)

R&D | RDI

Pre-Competitive Consortia

Open PHACTS (Innovative Medicines Initiative)

Pistoia Alliance

W3C Health Care & Life Sciences Interest Group

National Center for Biomedical Ontologies (NCBO)

Open BEL (Biological Expression Language)

R&D | RDI

Flexible and adaptable l Dynamic schema-less approach;

rapidly incorporate new datasets l Queries are adaptive, based on

scientific profiles (e.g. chemist or biologist)

l Use-case driven & tested by users in industry and academia

Great APIs for building apps l JSON REST-style APIs l Also supports XML, Turtle, etc l Chemistry services l Exemplars show how to take

advantage of the platform l Clear licensing details for all data in

the system

Key Points Large scale data integration l Focused on pharmacology l We integrate so you don’t have to l Dealing with multiple identifiers for

the same concept l Always up-to-date l State of the art and industrial

strength

Focus On Data Quality l Provenance is critical – know where

every data point comes from l Google-style indexing; Data

providers keep their own data l Chemistry Standardization –

enhancing chemistry connectivity

l Working with data providers to expose and enhance their data 18

Open PHACTS (Open Pharmacological Space)• EU/EFPIA Innovative Medicines Initiative (IMI) project

From: Open PHACTS Architecture - Building the extensible platform (EuroQSAR 2012 in Vienna, 30.08.2012)

R&D | RDI

W3C HCLS

Activities:o Continue to develop high level (e.g. TMO) and architectural (e.g. SWAN) vocabularies.o Implement proof-of-concept demonstrations and industry-ready code.o Document guidelines to accelerate the adoption of the technology.o Disseminate information about the group's work at government, industry, academic events

and by participating in community initiatives.Use Cases/Domainso Drug Discoveryo Electronic Lab Notebookso Comparator Arm Datao Patient Data Ownershipo Biotech Acquisitiono Supply Chain Automationo Web Integrationo Bio-surveillanceo Co-development

http://www.w3.org/blog/hcls/

The mission of the Semantic Web Health Care and Life Sciences Interest Group (HCLS IG) is to develop, advocate for, and

support the use of Semantic Web technologies across health care, life sciences, clinical research and translational medicine

R&D | RDI

Pleas & Future Directions

PrognosticationsRDF Content Farms

Vendors: Someone will figure out how to monetize this

Consortia: Who ‘Owns’ this?Government in Health Care & Life

Sciences; can we learn from the EPA? open.gov?

Shrinking PharmaSmaller (or virtual) footprint

oBack to first principles—what do we do best?

More modeling & SimulationRise of the informaticist…

Community HelpResist Silos

Where is your data? Where is it likely to be in 5, 10 years?

A single triplestore with all ETL-streams leading to an RDF ‘data warehouse’ is another silo

oBuilding on top of ‘standards+’ may lead to silos

Need to follow & influence emergence of standards if you have a ‘horse in the race’

Support (business focused) ConsortiumsWe’re doing the same job many, many

times

Thank YouListeners & Molecular Med TRI-CON 2013 Organizers

top related