translational research informatics whitepaper

The scientific information landscape and applications to Translational Research

This publication contains general information only and Deloitte is not, by means of this publication, rendering accounting, business, financial, investment, legal, tax, or other professional advice or services. This publication is not a substitute for such professional advice or services, nor should it be used as a basis for any decision or action that may affect your business. Before making any decision or taking any action that may affect your business, you should consult a qualified professional advisor. Deloitte shall not be responsible for any loss sustained by any person who relies on this publication.

This whitepaper includes data and information that shall not be disclosed outside of the intended audience and shall not be duplicated, used, or disclosed — in whole or in part — for any purpose other than consideration of this whitepaper. The intended audience may not rely upon its contents for accuracy or completeness nor to use it to formulate official policy or official decisions. The intended audience may consider its contents “as-is” without any warranty of quality. In no event shall any part of this whitepaper be used in connection with the development of specifications or work statements with respect to any solicitation subject to full and open competition requirements. This restriction does not limit the intended audience’s right to use information contained in these data if they are obtained from another source without restriction.

The scientific information landscape and applications to Translational Research iii

Table of contents

Contents i The scientific information landscape and applications to i Translational Research 1 Executive summary 2 Creating a vision for Translational Research 5 Building blocks of Translational Research investments 22 Implementing the vision for Translational Research 26 Conclusion 27 Acknowledgments

The scientific information landscape and applications to Translational Research 1

Executive summary

The following are common challenges observed when adopting a translational research informatics strategy:•Complicationswhenimportingdatafromvariousdata

sources — data transformations that are required to import data into internal, structured data repositories can be resource intensive, requiring manual data entry and creating surrogate keys to link data files

•Inconsistenciesindatadictionarystandards—guidanceontheuseofstandards,suchasClinicalDataInterchangeStandardsConsortium(CDISC),InternationalClassificationofDiseases,ninthrevision(ICD09),orInternationalClassificationofDiseases,10threvision(ICD10),andinternaldefinitionsanduseofstandards is unclear; internal standards documentation is not current with the changing environment and data-importing processes do not incorporate these standards; standards are often not fully adopted

•Impedimentstointeroperabilitywithexternalpartners and cross-trial analysis — lack of a uniform implementation and enforcement of data dictionary standards during data acquisition and submission; i.e., there is no requirement to submit clinical or mechanistic datausingCDISCorotheracceptedstandards;partnersmay not uniformly adopt the same standards

•Datasharingishinderedbydifferinginformaticsandanalytic tools — sharing of research data is not fully provisioned in the contracts leading to questions on the extent and scope of data that can be shared, resulting in incomplete and outdated data, unsupported with sufficient metadata; patient-derived consents for sharing clinical trial data is limited

•Dataqualityconcerns—currentvalidationfocuseson completeness of submission and not on reliability, translation, or integrity of the data content

•Variousdataanalysisandvisualizationtoolsareused—use of tools are not consistent nor standard, resulting in partners establishing their own research collaboration portals

•Patientprivacyrightsissues—informedconsent

Investments made in biomedical and translational research have led to advancements in providing insights into the mechanisms and disease relevant markers that are implicated in human pathologies. This advancement has led to innovative approaches in designing basic research and clinical trial strategies that are more effective in translating the discoveries made at the bench to treatments delivered to the bedside. In order for these advancements to be truly effective, traditional and coveted research silos should be broken down, making these discoveries more available so they may be shared and leveraged across the biomedical research community. This is expected to require a more open and collaborative environment in which privately and publicly funded researchers, primary investigators, and clinicians work together through a knowledge exchange where they share and leverage their collective discoveries and insights in designing new, novel, and innovative approaches to treating human diseases.

With the advent of this investment in biomedical research, technologies, and collaborative networks comes an explosion of data; data derived from many different sources, structuredindistinctlydifferentformats,analyzedusingdifferenttools,andinterpretedfromdifferentperspectives.Atranslationalresearchinformatics(TRI)strategycanbeusedtohelpovercomemanyofthesechallengestohelprealizethepotentialthisinformationbrings to the advancement of new, novel, safe, effective, and innovative treaments for human diseases.

TRI is the practice and critical cornerstone in providing a platform on which to bring these vast amounts of basic scientific and biomedical research data together in a cohesive and structured way to perform meaningful and intelligible analyses to drive clinical success. ATRIsolutioncanserveasavenuetosubmit,archive,exchange,andanalyzescientific,bioinformatic, and medical research data, addressing data standards, data interoperability, and solutions to ethical issues around data sharing. Data acquisition and interoperability across multidisciplinary research areas, extensible collaboration portals with standard andbroadanalysisandvisualizationtools,andinherentculturalcollaborationconcernsin a competitive scientific community are key causes preventing the full adoption of TRI solutions. It is critical to identify key enablers to reduce these barriers to envision and realizetheinformationlandscapeofthefuture.

In this paper, we have set the context for an anticipated future vision of the postcommoditizedgenomicresearchenvironment,whereTRIsolutionscanbeimplemented to operate to succeed. We have outlined an approach to acquiring and storing biomedical data from several sources, implementing and enforcing data standards and interoperability, integrating the data and anchoring them around exhaustive controlled vocabularies and ontologies, creating and making available tools for analysis of heterogeneous data sets, and providing an appropriate supportive technology base to enable these functions. Implementing these changes can help determine the smooth operation of a TRI solution, fulfilling the scientific needs of the community, increasing theappealoftheplatform,andcatalyzingadoptionandcollaboration.Ourpaperdescribes specific execution measures involving strategy formulation, careful planning, multithreaded execution, and an ongoing change management as a means of translating ideas to action, thereby increasing a TRI solution’s usage, and making it an invaluable collaborative resource. We believe that the approach we lay out can be widely adaptable across commercial industries, academia, and governmental agencies such as the National InstitutesofHealth,FoodandDrugAdministration,CentersforDiseaseControlandPrevention,DepartmentofEnergy,EnvironmentalProtectionAgency,aswellasothers.

2

Creating a vision for Translational Research

A systems biology approach for disease etiologySuccessful cures for diseases remain comparatively rare.Pathologicalprocessesofdiseaseinvolvemultiplepathways, cells, and mediators, and are influenced by a variety of genetic and environmental risk factors. There are significant variations in disease manifestation and prevalence across global geographies and human genomic makeup.

The wealth of experimentation techniques over the past few decades of biomedical research has advanced our empirical knowledge of biology and left scientists and clinicians with tremendous amounts of data and information. Indeed, a large portion of experimental data is derived from high-throughput experimentation such as genomics,transcriptomics,andmetabalomics(collectivelytermed“omics”data)whichgenerateseveralgigabytesofdata per experiment. We are, however, severely limited by ourabilitytointerpret,analyze,andsynthesizeknowledgefrom these datasets, which we term as “big-bio-data.” The scientific community, therefore, desperately needs data management accelerators and knowledge-exchange

platforms that allow the interpretation analysis and exchange of big-bio-data. TRI solutions that incorporate these capabilities and unlock the potential of big-bio-data can be pivotal to the advancement of scientific innovation.

A broad TRI solution should provide a forum to enable an interdisciplinary approach that draws from data of many types; e.g., genomic, genetic, cellular, molecular, and physiological, and from many related research areas. AsdepictedinFigure1,TRIsolutionsshouldemployaninterdisciplinary approach to providing targeted insights in the specific fields of research while also connecting the data from other human systems, thus opening a window into systems biology.

“It is a very sad thing that nowadays there is so little useless information.”

Oscar Wildein“AFewMaximsfortheInstructionoftheOver-Educated,”

Saturday Review(17November1894)

Figure 1. Systems biology — A connected approach across contained systems

Source:DeloitteConsultingLLP

e.g., Metabolic systeme.g., Nervous systeme.g., Immune system

Proteases

mRNA

Antibodies

Regulatory motifs

Exons and introns

Phagocytes

DNA

tRNA

Ion pumps mRNA

Neurotransmitters

Regulatory motifs

Exons and introns

Synaptic molecules

DNA

tRNA

Motor axons

Glucagonand insulin

mRNA

Citric acid cycle/ molecules

Regulatory motifs

Exons and introns

Metabolites

DNA

tRNA

IsomerasesConvertases

Defensins

Glial molecules

A robust TRI solution must generate insights from information across human systems enabling a systems biology approach.


Craftingaclear,dependablelensforsystemsbiologythrough big-bio-data management is a difficult task. There are many challenges; the most fundamental is the challenge of pulling together the exabytes of data that span the biology universe, followed by the challenge of drawing actionable scientific and medical conclusions from it. Addressing these challenges requires consideration of significant investments in a variety of capabilities, a paradigm shift in the way life science, research, and healthcare stakeholders think about collaborations, and theengineeringofanoperatingmodelthatincentivizesthescientific community to collaborate fruitfully while retaining the rights to discovery and intellectual property.

Mega-collaborations — The path to systems biologyStakeholders in the biological data arena are investing inlocalizedcapabilitiesthatpulltogetherbasicresearch,clinical trial, and genomic data through TRI solutions. Such efforts across drug and medical device makers, care providers, research centers, federal government, nonprofit organizations,andacademiaareaimedatdevelopingadata-poweredapproachtosystemsbiology.ExamplesincludetheNationalCenterforAdvancingTranslationalSciencesinthefederalspace,theEmergeNetworkinacademia,thePistoiaAllianceforthecommercialsector, and the Innovative Medicines Initiative for public-private research. It is expected that this trend will greatly increase over the next few decades, creating hubs of sector-specific collaboration that later morph into well-organizedmega-collaborationhubsthatoperateacrossthe sectors (Figure 2).

Figure 2. Interconnectivity of collaboration portals are required to support the scientific landscape of the future


TRI solutions will need to address these key themes and the

interconnectivity of the biomedical landscape

Cross-disciplinary research

Dense knowledge traffic

Big picture insights/ systems biology

Bench-to-bedsideand back

Exabyte global data management

Global talent management

Cloud R&D

IP rights, privacy,and security

Oncology

Neu

roCa

rdio

Respiratory

Immunology

Infectious

Neuro-

scien

ce

Platform

RNAi

Cardio-

vascular

Med

ical device

maker

Orp

han

drug

mak

er

Specialtypharma Other

examples

Vaccinem

aker

Drugmaker

Commercial BioPharma companies

Universities

Non-profits

Care providers /

sites of care

HuGE

Pato

gen

Kbas

e CTR

CPDB

HERO

WONDER

BTRISImmPort

BMIS

CDC

NIHOther

examples

FDA

EPA

LAN

L

Navigator

Government funded and managedD

OE

4

A collaboration model can be the key to closing the loop between the activities on the bench and those on the bedside, thereby providing benefits of accelerated molecular discovery, improved drug development, increasedpatientsafety,andtargeted,personalizedpatienttherapies.Effectiveoperationofsuchhubsrequiresthatentities consider scaling exponentially while addressing the concomitant challenges of data, technology, infrastructure management,intellectualproperty(IP)rights,andthechallenges of collaborative data sharing and knowledge exchange. There are three capabilities critical to the operating model for such initiatives — data management, technology, and collaboration. They are not new concepts; however, providing these capabilities with very specific direction can help transform TRI solutions to knowledge-generating centers:

•Datamanagement — Making it specific for biomedical data: Incorporating relevant data standards, controlled vocabularies and ontologies, algorithms, methodologies and analytical tools, and engineering better processes of data governance and sharing can allow better data interoperability, insight generation, and collaboration in the research community

•Technology — Letting it support the data management needs: Focusing on scalable, flexible infrastructure, and cost-effective yet high-quality technologies for data storage and computing can enable better handling of the several million bytes of data and accelerate the pace of collaboration

•Collaboration — Enabling the technical necessity: Communitiesareenabledbycollaborationandcollaboration enables communities; hence, developing an enhanced data sharing and exchange model that is enabled by the data and technologies, as well as an operating model for collaboration that makes evident thebenefitsofknowledgeexchangeandincentivizessharing as a critical driver to the adoption

Tailoring these commonplace activities to help meet the specific needs of a research community can be a game changer for TRI solutions. In subsequent sections, we discuss in detail each of these capabilities.


“Where is the knowledge we have lost in information?”

ChorusesfromTheRock(1934) Eliot,T.S.(1934),TheRock,London:Faber&Faber

Figure 3. A state-of-the-art data management approach


Building blocks of Translational Research investments

Data management — Make it specific for biomedical dataThe scientific research landscape of the future will likely be one that is knowledge driven. Bringing together laboratory results obtained from early discovery to clinical research with clinical trial and patient outcomes will be key to deriving greater insights and knowledge on health and disease. For TRI solutions, this implies the ability to acquire,share,manage,andanalyzedataobtainedfromthe collaborators within portals along with public domain information contained within publications and publicly availabledatasets.Providingforthesecapabilitiesrequiresasoundbaseindatamanagement.Figure3describesthe elements of a state-of-the-art data management frameworkasitappliestoanorganization,whichdealswith several collaborators that both produce and need access to internal and external data.

The data management challenges of systems biology are complex, due to its highly interdisciplinary nature and strong dependence on high-throughput experimental techniques that provide a large amount of data. Meeting these challenges requires that the system allows for the rapid introduction of new data sources derived from new and emerging technologies, for interoperability between datasets and analysis tools, and for a sophisticated means to integrate data sources to support data mining and searching operations. However, typical implementations based on classical enterprise and business intelligence systemswillnotworkforthishighlyspecializedscientificdomain.Adatamanagementapproach(Figure3)thatis tailored to the needs of the relevant biology and biomedical data requires consideration of Data Acquisition, DataIntegration,andDataPresentationandExchangecapabilities to handle the data sharing and exchange needs of TRI solutions. An overarching Data Governance Framework, grounded in the leading practices of data security and interoperability, will likely facilitate the effective movement of data through the data management pipeline.

Data Acquisition

Data sources Data stagingData acquisition

Data standards and naming conventions

Data Presentation and Exchange

Data search Data exchangeData analysisand mining

Data visualization

Data Integration

Data modelsData

transformation Reference data

Metadata management

Data governance and ownership

Flexible, scalable, reliable, and fast

Data securityData interoperability

6

Figure 4. Data acquisition components of the data management approach


DataSources — They matter: As a portal with the goal to provide valuable insights to its researchers by using an interdisciplinary approach, TRI solutions must enable access to a broad and exhaustive collection of datasets and reference data from a wide variety of biological domains and sources. In addition, datasets from the collaborating programs should be made available to powerthiscollaborativeresearch.Controlandincentivemechanisms, as well as a cultural change, are essential to motivate participating members to submit their data into secure repositories so that TRI solutions can serve as both the archive and exchange portal for experimental data generated under its aegis. With respect to the data sources available in the public domain, these are continuously changing. To keep abreast of the changing landscape, there should be an automated means of monitoring for new data sources, as well as a systematic community outreach mechanism to update the inventory sources required for incorporation. The value of data about the datacannotbeoveremphasized.Metadataiscriticaltothe scientific process where the value of a data is defined also by the method of obtaining it. For a portal to provide insight generation, the emphasis on acquiring and storing the metadata is critical. It also enables the development of applications to further aid the scientist and make the language of informatics more easily understood.

DataAcquisition — Automation is key: In order to keep abreast with the pace of research, significantly automating data acquisition processes should be considered. A determination as to which datasets should be incorporated into the portal’s data stores and which can be accessed by referencing will be necessary. For the former, there will need to be mechanisms for automatic feeds that keep the data stores current and in sync with the repository. For the latter, it means semantically interoperable referencing requiring some manual processing to upload data. Regardless of the mode used for data acquisition within the portal data stores, manual or automated, comparing the data against standards is critical for further use and analysis. Automated scripts to check for compliance with such standards and for conversion to standards implemented within the portal are critical to the effectiveness of this process.

DataStaging— Better staging, better performance: Good design principles should provide for a staging area as a place for temporary storage for data that is sourced. A staging area, distinct from a presentation layer, and one that takes into account the variety as well as the uniqueness of data types encountered and the relationships they bear to each other, can facilitate improved downstream processing, culminating in better insights.

Data Acquisition

Vision

Data Sources Data Acquisitionand Validation

Data Staging

Potentialbenefits

Potentialbenefits

Clinical data

Research data

Laboratory data

Reference data

Metadata

Multiple private and public sources

Automated procedures

for data validation

User provided direct submissions through

templatesStaging schema

Direct data feeds via ODBC, web services,

plug-ins, other)

• Inclusions of other recognized reference and research data

• Direct data feeds from public sources on a periodic basis using pull/Web services

• Creating a well-defined staging area leveraging bio-datawarehousing effective practices

• Ability to provide users with more relevant and specific context to the interpretation of experimental data

• Enhanced acceptance of a wide range of data sets

• Ability to provide users with a variety of public data

• Improved user adoption, easier to upload data

• Effective historical tracking• Support for data curation, algorithm,

and methodology management to transform data to knowledge

Data Integration

Effectiveprocess

Effectiveprocess

Vision

DataAcquisition—MakeitscalableFigure4demonstratestheDataAcquisitionphaseofthedatamanagementprocess.Threeaspectsoftheacquisitionprocess are critical to enabling collaboration in the forever-changing data landscape — Data Sources, Data Acquisition andValidation,andDataStaging.


Figure 6. Example biological datasets

Potentialbenefits

Potentialbenefits

Data Integration

Effectiveprocess

Effectiveprocess

Vision

Data Staging

Staging schema

• Creating a well-defined staging area leveraging bio-datawarehousing effective practices

• Effective historical tracking• Support for data curation, algorithm,

and methodology management to transform data to knowledge

Data Integration

• Integrated data model (example options: Highly denormalized generic storage, such as i2b2 or snowflaked normalized model. Exact design will depend on further analysis)

• Easier to standardize for future data collaborations

• Improved efficiency in knowledge generation

Research data

Clincial trial data

Other

Integrated data model

Reference data Ontologies

Information Presentation

• Redesigning the information presentation layer to accommodate complex methodology management, algorithm management, and data curation capabilities

• Increased user adoption• Improved scientific research capabilities

Met

hodo

logy m

gmt

Algorithm mgmt

Data curation

Data marts

Materialized views

Bio-cubes

Data Presentationand Exchange

Vision

Met

hodo

logy m

gmt

Algorithm mgmt

ETL

Figure 5. Data integration components of the data management approach



Dataintegration—Differentapproaches,differentgainsData integration is critical to the synthesis of knowledge from data. Activities in this phase combine the data to facilitate their delivery to the tools and methods that users access to interpret the information and infer knowledge. Figure 5 conveys the activities that are important at the data integration stage.

Ontologies and controlled vocabularies — Critical for insights: The new ecosystem consisting of multiple research centers addressing several related topics has created numerous sources of data that most likely speak of the same biological objects, processes, and observations in different ways. An example of the potential types of datasets encountered is mindboggling and represented in Figure 6.

Functionalgenomics

QSAR models

System biologymodels

Proteomics

Medicinalchemistry

Compoundlibrary

PharmacogenomicsMetabolomics

Clinical genomics

Metabolomics

AE/SAE data

BiobankPopulationgenomics

SociomicsComparative

genomicsToxicology data

PK/PD data

Animal models

HTS data Efficacy data Economics data

Research and pre-clinical Development Commercial

8

Figure 7. Data model options

Approach Description Pros Cons Example

One integrated data model housing many types of biomedical and/or clinical data

Highlydenormalizedgeneric data storage layer

•Highly generic model does not require ongoing changes

•Integrated clinical, genomic, research, and public reference data

•Easiertostandardizeacrossconsortia,facilitating collaborations

•Requires heavy up-front ETL/datapipelinemanagement

•Queries get very complex; hard to extract scientific insights

•The data model of the National Institute of Health (NIH)BiomedicalTranslational Research Information System

•i2b2platform

Integrated data model with snowflake sub-models

Highlydenormalizedgeneric data storage layer augmented with normalizeddomain-specific models

•Integrated clinical, genomic, research data•Lessstressoninfrastructurecomparedto

the single data-model approach above •Easiertoquery,betterorganizeddata

domains

•More models to manage, more difficult to change compared to the single data-model approach above

Deloitte Health Insights andInformaticsPlatform

Delineationofclinical and “omic” data models

Onedatamodelforclinical data, one for “omic” data

•Twospecializedmodelsforclinicalpatientdata versus “omic” data

•Glove fit for certain scientific applications

•Hard to link two models for querying

•Not all data can be clubbed in two

OracleEHATranslationalSuite

One integrated data model with limited,selectdatafocusing on specific disciplines

Oneintegrated,butspecializeddata model — i.e., oncology

•Highlydisciplinedspecializedmodeleffectively addresses user needs

•Queries are intuitive and fast — supports superb ad hoc data retrieval

•Difficult to scale to additional domains

•Difficult to retrieve end-to-end insights for petabyte-grade bio-data universe

MoffittCancerCenterResearchExchangeHub


The desire to obtain insights from data that has been acquired from different experimental approaches and settings can be accomplished by determining the equivalence and relationship of concepts involved. Controlledvocabulariesprovidethatanchorofequivalenceand ontologies help establish relationships between these concepts.Ontologies,therefore,arecriticalasframeworksfordataintegration.Ontologiespowertheabilitytoderive new hypotheses from a limited set of preliminary observations.Academiaandstandardsorganizations,as well as the industry, are joining forces to develop standards for vocabularies and ontologies that will unlock the potential of research datasets. Furthermore, the data available for analysis must be structured around these ontologies in order for these ontologies to deliver their full potential. Together this will enable more powerful querying and interpretation of heterogeneous datasets, creating information assets that may be used and reused by the broader research community.

Datamodels — Tailored for biology: In addition to frameworks, such as ontologies and controlled vocabularies, the appropriate data models should be considered to house the data to enable efficient querying andretrievalofcorrectresults.Commonlyobservedrepresentations of biological data range from one data model to several data models linked to each other. Figure 7outlinesdatamodeloptions,theprosandconsofthese,andexampleswheretheseoptionswereapplied.OthernewertechnologiessuchastheW3CWebOntologylanguage(OWL)andtheresourcedescriptionframework(RDF)areusedtolinkdataonasemanticbasisandlaythefoundation to provide enhanced insight generation. Hybrid approaches overlay semantic methods on traditional data stores and allow for “integration” and “inferencing” with data stores in place. The enhanced solution depends on an in-depth assessment of the data stores and the needs of the research community. Biological data are extremely complex to model due to the networked relationships that the participating entities bear to each other; therefore, simple relational diagrams are inadequate for this purpose. In addition, the rapidly changing data landscape in biology makes the case for a data model that is scalable, extensible, and flexible to accommodate new data sources.


“If it takes 3 days to get an answer I’m not going to ask another question.”

Marc ParrishVicepresident,BarnesandNoble;speaking

attheGigaOmConference,2011

Datatransformation— Preparation for analysis: In order to present data to analysis tools, data must be prepared in a manner that allows for querying. Data curation, the first ofthesesteps,isasemi-automatedprocessthatorganizesthe data into a standard format and facilitating the reliability of the data. Algorithm management, the second step, prepares the data for faster querying. Algorithm management that is typically done using a standard set of

business intelligence tools must be handled in ways that address both the performance challenge and the domain requirement.ExamplesofsuchalgorithmsthatarerelevanttoTRIsolutionsincludenaturallanguageprocessing(NLP)engines that extract facts from publications and present them for querying, or those that prepare data for common statistical analysis.

Datapresentationandexchange—Manyusers,many needsThe scientific community is diverse, and in this age of systems biology; scientists, informatics professionals, and physicians are looking for information in the same data stack. They are seeking answers to different problems or different facets of the same problem and need to approach the data in different ways. They ask different questions and ask the same questions differently.

Empoweringsuchadiverseaudiencerequiresexcellentsearchandqueryanddataanalysisandvisualizationcapabilities. These tools and aids need to both reduce the complexity of the analysis that needs to be done as well as allow powerful insights to be generated from disparate datasets.AsdepictedinFigure8,themainactivitiesofthedata presentation and exchange phase are providing for data analysis and dissemination of results.

Figure 8. Data presentation and exchange components of the data management approach


Potentialbene�ts

E�ectiveprocess

Data Presentationand Exchange

Vision

Data Search and Analysis

• Integrate more open source tools (i.e., Cytoscape for pathway visualization)

• A services-oriented architecture (SOA) allows the incorporation of varioustools and the creation of a pipeline-based work�ow

• Increased user adoption• Selection of tools for scientists

accelerating research activities• Harnessing the tools from other

programs• An easy, intuitive work�ow

Gene expression toolsFlow cytometry analysis tools

Visualization toolsOther related analysis tools

Metanalysis toolsUser work�ows

Data Exchange

• De�ne data exchange protocols and recommend solution design to push data out

• Support for future collaborations • Support for large scale data

dissemination

Data standards

Exchange protocols

Information Presentation

• Redesigning the information presentation layer to accommodate complex methodology management, algorithm management, and data curation capabilities

• Increased user adoption• Improved scienti�c research capabilities

Met

hodo

logy

Algorithm mgmt

Data curation

Data marts

Materialized views

Bio-cubes

10

Information presentation — Specialized data marts for downstream processing: Biological data has several subdomains each with its individual characteristics. With the influx of data from several sources, it is necessary thatthecomplexbiologicaldatabereorganizedbydomainforfurtherdownstreamprocessing.Examplesofspecializeddatamartsincludegeneexpressionmartsor flow cytometry marts. The single largest benefit to this approach can be the enablement of the bottoms-up technique to assessing and refining the content and principles in each domain. This can allow a segmented approach to handling the rareness of different domains and their individual implications to the data model. These sophisticated techniques for transforming and presenting the raw data should be tailored to a domain and allow the rapid and relevant querying of information and generation of knowledge.

Datasearchandquery— Sophisticated yet easy: The addition of a broad set of data within data repositories will attract a larger and more diverse set of users to search

and query for information germane to their research. Two simple techniques that can enhance these capabilities areadhocandNLP-basedquerying.Theformerisagoodalternative to the constrained, restrictive queries that applications typically offer. Users can construct complex queries between data sources and data types using the graphicaluserinterface(GUI),supportedbyaneffectiveback-end query engine. Such an approach increases user appeal, due to the ability to query in a more naturally scientific manner. Natural language-based query engines are sophisticated translators of users’ queries that are developed to retrieve the right information. Both the natural language algorithms and the back-end of ad hoc queryfunctionalitywillrequirecustomizationtomatchthe needs of the biomedical domain, and incorporate the wisdom of domain-specific ontologies and vocabularies to power them. In addition, increased performance capabilities require efficient indices. There are several OpenSourceindexingtechnologiesavailable;onethathasdemonstrateditselfistheLucenetechnology.Incorporatingsuchtechnologiescanalsoremove/reducethe need to reproduce and store large datasets from other publicsources(e.g.,NationalCenterforBiotechnologyInformation(NCBI)),sincereferencingalonesuffices.

Dataanalysisandmining — Powerful approaches for powerful insights:Largernumberofparameterswithina dataset and larger numbers of datasets — this is the complexity of the data landscape of the future. Two critical necessities to empower the analytical approach are the ability to reduce the dimensionality of datasets and to allow inferencing over heterogeneous datasets. Foundational recommended analytical approaches outlined below are a rich metadata repository, a common standard vocabulary, and rich ontologies, alluded to before.

•Clusteringdatabyphenotypicattributes.Toolsthatsupport clustering and reducing the dimensionality ofdatasuchasGeneSetEnrichmentAnalysis,(GSEA),1 currently the most cited de facto standard for clustering and integrating data is a natural candidate forinclusionwithinaTRIsolution.GSEAprovidesameans of integrating mechanistic data from “omics” expression sets with other attributes, such as phenotypic or structural data, providing the first step towards meta-analysis.

“The value of information about information can be greater than the value of the information itself. … I am willing to project an enormous new industry based on a service that helps navigate through massive amounts of data.”

Nicholas NegroponteCreatorofthe$100laptop.speakingof

analyzingdatainWired,June1994

1GeneSetEnrichmentAnalysis,TheBroadInstitute(http://www.broadinstitute.org/gsea/index.jsp)


•Overlayingliteratureinformation.Researchdatapublished within the literature is a powerful aid to researchers. The development of natural language processing algorithms to mine text has substantially reduced the onerous task of reading volumes of literature, and instead offers the opportunity to serve uptextualinformationin“digitizedpieces,”whichcan then serve as pieces of data for analysis alongside experimentaldata.Pathwaysandnetworksderivedfromthesefactsarecommonplaceinbiologyresearch2andenable scientists to interpret experimental results in the context of these published facts. Simple yet sophisticated network approaches allow researchers to identify similar or contradictory results and correlate observations to seemingly unrelated phenomena. TRI solutions can benefit significantly from natural language processing (NLP)components.

•Toolsformeta-analysis.Researchwilllikelycontinueto depend on high-throughput experimentation and tools that interpret the results from several laboratories. Interpretingandanalyzingdatasetsobtainedunderdifferent experimental conditions from different labs requires sophisticated statistical techniques. Meta-analysis is the statistical analysis of heterogeneous datasetsforthepurposeofinterpretingandanalyzingthe combined set of findings. Investing in these methodologies is crucial to the ability of TRI solutions andtheiruserstohandleandanalyzedatasetsfromadiverse set of laboratories and collaborators.

2IngenuityPathwayAnalysis(www.ingenuity.com);PathwayStudio-AriadneGenomics/Elsevier;Metacore-GeneGo/ThomsonReuters

Figure 9. A network view from Cytoscape.3

Datavisualization— Window to insight: Working with large digital datasets to identify actionable insights will require powerful analysis paired with powerful display. Sophisticatedvisualizationtechniquesandalgorithms,including automated algorithms, can enable people to visualizepatternsinlargeamountsofdataandhelpthem unearth the most pertinent insights for a domain as complex as biology. A powerful open source tool that can quicklybringvaluetoTRIsolutionsisCytoscape(Figure9)—anopen-sourcesoftwareplatformforvisualizingcomplex networks and integrating these with any type of data. Although developed as an open source tool, Cytoscapehasseveralindustrypartners,suchasgenearray companies, sequencing companies, and standards consortia, which can contribute to its development. It is designed to integrate multiple ontologies and filter and present these datasets using these ontologies. IncorporatingCytoscapeasthevisualizationinterfaceof choice can allow users to view any given dataset in the context of pathways and scientific information, and explore correlations between datasets.

12

3www.cytoscape.org;Smootetal.Bioinformatics2011,27:431.

Open

Sou

rce

Prop

rieta

ry

ScientificInnovation

Collate Analyze Infer

terabytes pentaytes exabytes zettabytes

HadoopPostgress

ETL

Netezza

Teradata

ECL

Greenplum

MATLABSPSS

RapidMiner

IngenuityPathway

Physiolab

Cell Publisher

COPASI

Text MiningAlgorithms

Biology Data Analytics Continuum

OWLElixir

RDF

Oracle

Ensembl BowTie

SAS

OptGene

Pathway Studio

OligoStar

CytoscapeSciPyTopHat

R

Figure 10. Example tools for biological data analysis


Workflows — Employing a scientific workbench: The scientificprocessisiterativefromhypothesizingtoexperimentation and analysis. At each step, scientists typically use a variety of tools for querying, analysis, andvisualizationofdata,toassess/determinethattheirresults are consistent and their hypotheses are sound. TRI solutions can benefit from incorporating a multitude of tools to cater to the diverse needs of its audience. Incorporating open source tools should be considered as a cost-effective, yet reliable, option to help meet thisobjective.Figure10representsanexampleofopensourced and proprietary sourced tools to support the biology data analytics continuum which may be valuable to uncovering innovative hypotheses from several million bytes of data. Scientists also often have proprietary tools withintheirfirewallsthattheyutilize.Toolanddataintegration plans, as well as the workbench discussed next, should consider the need for interplay within and across open source and proprietary tools.

To effectively expand the use of a TRI solution portal to both scientists and informatics professionals, the portal must provide both technical confirmation and ease of use. Most scientists may not be trained to distinguish between the strengths and limitations of each tool, particularly with the more sophisticated statistical and bioinformatics tools and algorithms. Although incorporating several tools and datasets is necessary, it is important also to keep users engaged in using them. A solution for this lies in a workbench designed to guide the user through the necessary operations, the choice of tools available for their need, and allows them to operate seamlessly between the portal and their in-house proprietary tools. Intuitive workflows can be created by stringing together several tools, algorithms, and scripts, making the portal a valuable one-stopshopforanalyzingthedifferentdatasetsittouches.Figure11providesaconceptualdiagramofsucha workbench.


Inspect Interrogate Refinehypothesis

ValidateInquire

hypothesizeexperiment

Collate

Open source tools

Analysis/Statistical Visualization

Proprietary tools

Secure scientific workbench Proprietary workspace

– Bring experimental data sets together in secure environments

– Visualize data

– Use tools to analyze the experimental datasets

– Visualize analyses

– Analyze data against benchmarks

– Leverage analytical tools for multidimensional, heterogeneous datasets

– Use proprietary tools to analyze and validate results from scientific workbench

Metabolomics Cell biology

Biology Genetics

Experimental data

Genomics Proteomics

Publications data

Pipeline workflow

Analysis/Statistical Visualization

Ontologies and/or controlled vocabularies

Metadata

Clinical data

Pharmacology

ADME

Toxicology

Physiology

SOA architecture

Genetics

Figure 11. An intuitive workbench


Reusable workflows are essential to efficient and reproducible analysis of data.Figure12showsanactual implementation of a workbench that incorporated a multitude of proprietary and open source tools for gene expression analysis. Bioinformatics and scientists were presented with different user interfaces. Some ready-to-use workflows were implemented and provisions built in for customizingworkflowsbystringing together tools, scripts, and algorithms.

CREATE WORKFLOWS —String together methods,tools into a workflow andstore them for repeated use

COMMAND LINE or VISUALS — Choose depending on who youare — a bioinformaticsprofessional or a non-bioinformatics user

REFERENCE DATASETS — Access public and privatedatasets in a secure environment

DATA — Use intuitiveinterfaces to gather datafor analysis

1

ANALYZE — Use specifictools to analyze the data;e.g., Pathway Analysis ofGene Expression Sets

2

COMPARE — Statisticalanalysis; meta-analysis withheterogeneous datasets

3

VISUALIZE — Use publicand private tools touncover interactions

4

HONE IN — Discover detailsfrom scientific literature

5

Sample analysis workflow

Figure 12. An intuitive workbench for gene expression analysis

Source:PrototypedevelopedbyDeloitte’stechnologypartners

14

Figure 13. Data governance frameworkDatagovernanceanddatastandards—Amustforinteroperability With different datasets come different data types, eachwiththeirownstandards.Forexample,theCDISCstandards apply to clinical datasets; the Genomic Standards Consortiumstandardstogenomicdata;andtheMicroarrayGeneExpressiondatastandardsforgeneexpressiondatasets. A successful collaboration requires effective and reliable data translation between research groups, which then depends on data interoperability and data standards. A structured data governance program is essential to help achieve data interoperability and facilitate collaboration activities. Data governance can establish standards and practices to enable the translation, quality, and reliability of the information used to support critical decisions. Data governance team members are engaged throughout the datamanagementprogram.Figure13providesadatagovernance model example for supporting these efforts.

Three decision-making levels are needed to support this data governance model:

•Strategic—Theleadershipcommitteesetsthedata governance vision and is accountable to make certain a data governance program is established and supported. It provides guidance on funding and authorizesthedelegationofdecisionstothelowerlevelsof the governance model, which are responsible for implementing the data governance program.

•Tactical—Thetacticalcouncilsetsthestandardsandrequirements to facilitate data interoperability and collaboration. It will be composed of scientific, analytics, visualization,andstandardontologyandvocabularysubjectmatterspecialists(SMSs)todefinethestandardsand requirements for data sharing and interoperability (e.g.,ontologies,controlledvocabularies,andotherdatastandards derived from existing standards, such as the MedicalDictionaryforregulatoryActivities(MeDRA),CDISC,NCBI’sGeneExpressionOmnibus,etc.).Themembers at this level can be aligned to metadata, change, outreach, and the “omics” data management groups. The council should provide guidance and oversight for these groups to facilitate alignment to the data governance vision and objectives.

•Operational—Thisistheexecutionlevelofthegovernance model, composed of technology SMSs anddatastewardsalignedtosubjectareasofpractice/theirspecialization.Theseindividualsshoulddeterminethe standards and policies defined and developed are implemented and adhered to by the user community. They raise issues to the tactical council for consideration and resolution.

TRI approach should enforce necessary compliance to current data standards and be able to accommodate standardschangesovertime.Enforcementofstandardsand exchange protocols facilitates the reliable acquisition of information at the front end of the data management workflow and the effective publishing and dissemination of results at the end. The more the data management workflow allows for interoperability, the more it helps the organizationmeetthegoalofadoptionandtruesharingofinformation and knowledge within the community.


Data governance and ownership

MetabolomicsProteomics

Transcriptomics

Genomics

MetabolomicsProteomics

Transcriptomics

Example data stewards

BI delivery

Data architecture

Tools acquisition and delivery

DG leadership committee

Data governance council

Change management and compliance group

Outreach management group

Outreach management group

Infrastructure group

Data stewardship support group

Example data custodiansGenomics

Cross “omics” data lead steward group

Metadata management group

Strategic decision making

Tactical decision making

Operational decision making


Technology — Let it support your data management needsTo help keep abreast of the changing landscape of data domains and data types, A TRI solution technology strategy should consider the following criteria:•Bescalabletomanagelargedatavolumeswithhigh

performance•Applytomultipleenvironments(development,pilot,qualityconfirmation,andproduction),eachwithbigdata challenges

•Handlesemistructuredandunstructureddata•Integratesilosofinformationfromdisparatedatasets•Workwithahighlydistributedcomputingandstorage

environment•Allowsophisticatedanalysisandknowledgegeneration•Catertouserswithawiderangeofskills•Allowinformationsecurityasneeded

For the community at large, the big data challenge poses the follow-on challenges of data storage, access, analysis, andvisualization.ForTRIsolutions,atechnologybasemustprovide a means of addressing these challenges for now and the future. We address three focus areas in technology that will provide big wins for reaching this goal.

“Everybody has to be able to participate in a future that they want to live for. That’s what technology can do.”

Dean KamenPhysicist,entrepreneurandinventor,inaninterview withtheChiefGartnerFellow,DarylPlummer,2003

(http://www.gartner.com/research/fellows/asset_55323_1176.jsp)

Aservice-orientedarchitecture(SOA)—Asimplefuture-proof frameworkASOAisbasedonlooselycoupledserviceswithinterfacesthat are independent of the implementation. Services can be deployed and removed easily. Services can also be easilyintegratedacrossdissimilarplatforms.SOAstandardscan allow new applications to share a common model of development, maintenance, support, and staffing specialization.Servicesincludeapplications,toolsthat

access the data, and data management technologies designed to maneuver the data through the data management cycle. For TRI solutions, the advantage that SOAcanofferistheeaseofdelivering,maintaining,andenhancingdataanalytics,visualizationtools,andothersoftware solutions as Web services, and hence, cater to a broadcustomerbase.Figure14presentsanexampleofhowSOAcanbeappliedtosupportinformationaccessand collaboration.

16

Figure 14. A service-oriented architecture


Serv

ice-

orie

nted

appr

oach

End

user

s

Dat

a re

posi

tori

esD

ata

acqu

isit

ion

Dat

a so

urce

s

Dat

a m

anag

emen

t fr

amew

ork

External data sourcesExample collaboration programs

Solid tumor

Meningococcal vaccine

Duchenne Muscular Dystrophy

Immune-mediated inflammatory diseases

Parkinson and Alzheimer

HIV research

Mosquito-borne diseases Cystic Fibrosis research

Hepatitis C research

Other programs

Clinical trialdata

Publications

IP patent sources

NIH data

Controlledontologies

Data security

Controlledvocabularies

Data standards

Web services

Data ownership

Metadatamanagement

Data workflow

Data governance

Data validation

Extract, transform, and load (ETL)

Other capabilities

Genomics

Metabolomics

Expression

Genetics

Proteomics

Physiologicalmodels

Biology

Cell biology

Drug

Experimental data

Genetics

PharmacokineticsPharmacodynamics

Physiology

Pharmacology Toxi

colo

gy

Clinical research

Clin

ical

tri

alda

ta

Pati

ent

data

Developmentand post market

Private/publicworkspaces

Knowledgemanagement

and collaboration

Highperformancecomputing

Network andinfrastructure

Data miningand analysis

Visualizationtools

Website/portals Social media Information

exchange hub

Big

-bio

-dat

a fr

amew

ork

(for

evo

lvin

g da

ta n

eeds

in t

he f

utur

e)

Other external data sources

CommercialBio Pharma

US Govt.Agencies

Non Profit Academia Care Providers IndependentResearchers


Storageandhigh-performancecomputing—Howmuch is enough? And where?Regardless of an elaborate discussio n on whether or not parent datasets should be stored within the walls of a TRI solution, a more fundamental need is that of providing scalable, high-performance storage infrastructure to cater to the portal’s high-performance computing demands. Beyond meeting the capacity and performance needs of the computational activities, storage solutions must be scalable with no downtime, suited to the volume and complexity of information encountered, easily managed, and future-proofed to be easily updated to changing scenarios. Network attached storage devices have undergone some serious architectural transformations to accommodate these needs. Systematic and continuous assessment of the evolution of the needs and the character of data itself will determine which storage devices to use. With the advent of cloud technologies, high-performance computing(HPC)clustersandstorageinthecloudarenowrealities, therefore using cloud resources could provide both a scalable and cost-efficient option.

Technical support — Let it not be your Achilles heelReliableinfrastructureforinformationtechnology(IT)operations to support the TRI solution is crucial to the solution’s effectiveness and adoption. TRI solution strategies should consider effective practices and industry standards in hardware and software for networks, data storage, high-performance computing technologies, and service support mechanisms. Support should not be underestimated. The mandate of the scientific community is to do research, not to keep abreast of technologies and data platforms. Formal processes, such as the Information TechnologyInfrastructureLibrary(ITIL)/InformationTechnologyServiceManagement(ITSM),enablethecreation of these service models and facilitate their successful operation. Implementing scientific technical support and training teams to support the portal can reduce the burden on the TRI solution’s scientific users. Figure15outlinestheelementsofatypicalservicemodel.

Figure 15. Application management service overview


ASOAapproachforTRIsolutionscanfulfillseveraloverarching purposes:•Allowthecreationofaninformationhubthatprovides

users with access to publicly available datasets and allows data sharing and exchange in a flexible manner. Such an exchange hub provides the capabilities for simultaneous data acquisition in a standard, reliable, and traceable manner.

•Createtheabilitytoaddtoolsatwill.Thisenablesnotonly the ability to incorporate tools developed internally and other open sources, but also provides users a choice of analytical methods.

•Developanintuitiveinterfacetoworkwiththeirdataand collaborate with others.

TheSOAmethodthushelpsbuildaservicecommunity,where members may collaborate and take advantage of the large base of scientific knowledge and programming services that are available in the community and being constantlygenerated.ThereisnodoubtthattheSOAarchitecture, which enables the collaboration of both scientists and informatics personnel, could demonstrate eminently suitable for a TRI solution, providing a technology solution to make the most of the changing information landscape, and cater to the needs of users.

Facility management

Technology management

IT service management

TRI applicationmanagement

• ITIL service strategy and/or process improvement

• Asset management and optimization

• Operational stabilization• Application portfolio

rationalization and/or consolidation

• Demand and portfolio management

• Training and live support

• Infrastructure consolidation, virtualization

• Disaster recovery capabilities• Physical security• Modular data storage capabilities

• Standardized andsimpli�ed technology stacks and tools

• Cloud and/or software as a service

• Networking, platform interoperability

18

Figure 16. The scientific community will likely influence TRI solution capabilities

Product/service users want to define choices in a manner that reflects their view of value, and they want to interact and transact in their preferred style.


Collaboration — Enable a technical necessityWhile we have recommended a number of effective practices to support TRI solutions and advance tools and architecture to promote science, it is vitally important to keep the stakeholders and scientific community involved in key decisions related to technical and functional enhancements. The future of a sustained TRI solution is dependent on collaborative efforts to both influence enhancement of capabilities and to promote data sharing among the scientific community. If collaborative efforts are done correctly, they could pave the way for innovation and expeditetherealizationoftheproposedfuturestate.

Co-creating TRI solution capabilitiesIt has been borne out by our client experience and by industry practice that value is co-created with customers ifandwhencustomerscanpersonalizetheirexperienceusinganorganization’sproduct-serviceproposition(Figure16).Weobservethatproductvalueisbeingincreasinglyco-created by the capability provider and the customer. The evolution of TRI solution capabilities will require strong communication mechanisms and close coordination and prioritizationofthescientificcommunity’sneeds.SinceTRIsolutions encourages data sharing and collaboration, its adoption requires consideration of a paradigm shift for the research community.

To promote co-creation of TRI solutions, administrators should engage in outreach activities and give presentations at annual meetings to address issues about data standards, data submissions, analysis tools, and data sharing policies. Also, administrators should interact with researchers to collect requirements and feedback on advancing theproduct.However,thistraditional“activeprovider/passive consumer market construct” approach should be supplemented with leading practices that encourage active consumer interactions. The TRI solution funded program will need executive leadership that could leverage its relationship-building skills and sphere of influencetointeractandgalvanizeanexpandedpoolofresearch networks and institutions, collaboration partners, and other research funded federal agencies. The TRI solution funded program should interact with and seek input from each of these groups for adopting common industry standards, facilitating secure data exchange, and

mitigating concerns and issues related to patient privacy rights. Doing so will not only improve platform capabilities, but can also foster research collaboration among diverse scientific communities.Administrators should create both traditional and nontraditional outreach mechanisms with the scientific community to enhance TRI solution capabilities. Traditionally, outreach is a mechanism to provide messaging and training to the targeted community. Taking this a step further by leveraging this community to provide feedback and direction on future releases engages the community in building a highly valued, and viable solution. An example would be to leverage the community to provide insight on developing solutions foradoptionofindustrystandards(e.g.,CDISCandHL7)during the data submission process. Another example is to leverage the proposed SMS panel of biology research areas, bioinformatics, and clinical research practitioners to help create more effective avenues to interact with the end-user community. These interactions could occur, for instance, at annual scientific and TRI meetings.

Biology research

Cheminformatics

Bioinformatics

HealthoutcomesTRI solution


Advancing our scientific knowledgeA scientist’s normal collaboration network entails consuming public document records, presenting topics and findings at scientific conferences and consortia, and collaborating with research colleagues. These networks are invaluable and essential, because they not only enhance TRI solution capabilities, but also advance our understanding of the science and the mechanisms required to support collaboration. A TRI solution strategy should facilitate collaborations with an operating model in place

Figure 17. Meeting TRI solution challenges

“Collaborations become necessary whenever researchers wish to take their research programs in new directions.”

F. MacrinaDynamic issues in scientific integrity: collaborative research. A report to the American Academy of

Microbiology..1995.


designed to address technology, people, and process challenges(Figure17).

The recommended operating model, which is detailed in the next section, is designed to establish standards enabling interoperability and sharing of data across the scientific community using common vocabulary toolsetsandformats.Byharmonizingstandards,differentinformation systems and communities can “speak the same language” and work together technically to manage and use consistent and reliable scientific information.

Duchenne

Muscular

Dystrophy

Imm

une-

med

iate

d in

flam

mat

ory

dise

ases

Mosquito-

borne

diseases

Parkinson and Alzheimer

Meningococcal

vaccine

Cystic F

ibrosis

Clin

ical

tr

ail d

ata

Publication

Hep-C

HIV

Usability

Ince

ntive

s

Dat

a ex

ch.

prot

ocol

s

Technology

standards

Dataquality DatastandardizationSemantic

sM

etad

ata

Data

owners

Workflows

Datasecurity

Technology People

Process

Engage all stakeholders equally; identify mutual benefits, and communicate mutual interests

Implement national data sharing and privacy standards

Enact change management policies across research network

Create national consortium to collect and share data, to develop standards,

and encourage partnerships

Increase adoption by partnering earlier to identify new research

opportunities

Identify long-term, cross-functional sponsors to make strategic decisions,

drive adoption, and communicate vision

Develop data networks to integrate research and patient data, genotypes, and phenotypes

Leverage alternate media networks

Develop data standards and information sharing platforms

to share and use valuablepatient assets

Develop data quality metric

Build and/or enhance information management structures based

on semantic interoperability and metadata structures

Design process to consistently capture and utilize data

Develop methods to fill information gaps

20

The TRI solution operating model will drive alignment of the data interoperability committees with lead scientific, informatics, and key SMSs from the panel to enhance existing standards, policies, and procedures. As the interoperability of data is enhanced, further adoption of new user communities can extend the reach beyond current implementations and capabilities, as well as the scope and usefulness of the data.

The outreach program can facilitate TRI awareness and adoption. As shown below in the commitment curve(Figure18),movingstakeholdersfromastateofawareness to a state of ownership will require an effective communication strategy and plan.

The objective of communication is to proactively address potentialrisks,preparepeopleforchange,andminimizedisruptions to their business activities. Additionally, it provides an avenue to receive feedback on the information productdelivered.Communicationforumsandformatssuch as newsletters, collaborative research blogs, and social media should be used to increase adoption among

thescientificcommunity.ClarifyingtheTRIsolutionbenefits and capabilities through enhanced communication and training protocols, as well as branding, marketing, and consensus-building activities can increase adoption. Leveragingchangemanagementtechniqueswillsupportthe communication strategy and outreach approach.

Tying it together — An intuitive collaborative portalSetting a strategy to develop the TRI solution as a key entry point for users to access data sources and analytical methods as a community portal to provide a truly collaborative space where researchers learn about relevant topics and exchange research ideas with the community. CreatingasocialmediainterfacewithintheTRIsolutioncan offer the portal the chance to cater to its community in a more direct manner. Social media technologies are promoting these collaborations for corporations and individuals, fostering collaboration among the diverse communityofresearchers.Figure19showshowaportalimplemented by Deloitte brought together search, analysis, andvisualizationtoolsacrossdatasetsbelongingtopharmacists, payers, providers, and clinical trials.

Individuals are aware of benefits, basic

scope, and concepts of program

Resources make the program their own

and create innovative ways to use and

improve

Individuals understand how the

program impacts them and their Job

This is the way work is done — the new status

quo

Individuals understand and are willing to acquire skills required to

adopt the program

Individuals have heard about the

program

Commitment curve

High engagement

Awareness Generalunderstanding

Personalunderstanding

Willing to accept

Buy-in

Ownership

Low engagement

Figure 18. Driving to greater stakeholder engagement

As used in this document, “Deloitte” means Deloitte ConsultingLLP,asubsidiaryofDeloitteLLP.Pleaseseewww.deloitte.com/us/aboutforadetailed description of the legal structureofDeloitteLLPanditssubsidiaries.Certainservicesmaynot be available to attest clients under the rules and regulations of public accounting.Source:DeloitteConsultingLLP


Figure 19. Hi2 — A portal for exchange and analysis of life science and healthcare data

Dashboard Analytics Search Customstudies My workspaceContract

management

Navigation and login settings

Heat map controls and previewer

Custom study status

Interchangeable, user-configurable Web

part areas


management

Selected heat map cell (zoom, show details) Filter, pivot, and data

analysis user controls

Export to graphicsor PDF, shareanalysis, etc.


management

Navbar search on every page

Purchase status indicator

Google-like search and advanced filteringGoogle-like search result display

with highlighted search terms

Analytics and report table of contents

previewer


management

Contact details populated from user

profile

User-suppliedparameters (auto-

populated from analytics, no

input allowed)

Today, many healthcare providers, life sciences organizations,andhealthplansarefacingunprecedented technological and regulatory changes. A perfect storm of increased regulation, demand for lower cost, expanding big-data challenges, along with the need for innovation, requires a different approach for improving analysis and gaining better insight of the available information to make effective and timely decisions. The key to providing decision-makers the ability to act on insights obtained through the analysis of structured and unstructured data is required to achieve significant improvements in care delivery at the patient and population level.

Deloitte has made a significant investment in health reform and analytics. A portion of this investment is devoted to the development of a subscription-based “Insights as a Service” capability we call DeloitteHealthInformaticsandInsights(Hi2)asrepresentedinFigure19.

Hi2wasdevelopedtoworkdirectlywithlargehealth systems to deliver commercial subscription-based solutions, enabling collaboration to effectivelyaddressmarketneeds.TheHi2approachto addressing the market needs involves deploying local analytics solutions behind health system collaborators’ firewalls, protecting patient data, and enhancing the valuable assets health systems have worked so hard to create.


22

Culture change should accompany capability developmentWith the availability of integrated bioinformatics resources, the scientific community will need to move away from being insulated and begin traversing disciplinary boundaries to better understand, treat, and prevent human diseases. The bioinformatics resource hub owners would have to focus on changing the existing culture within the scientific community, which prides itself on secrecy and competition amongst scientific investigators. Convincingthiswiderscientificcommunitytocreateanduse a common interdisciplinary bioinformatics resource can be a hurdle that should be overcome for such a program to be effective. Implementing a TRI solution to support translational research efforts across research fields and areas will require strategy, planning, technology enhancements, and cultural change management.

Leadership commitment — Managing cultural changeCommunicationandculturalchangemanagementmethodologies is the key to managing, implementing, and supporting technology programs designed to enable

TRIsolutions.Inourexperience,successfulorganizationsunderstand that culture must be actively managed via a deliberateprocessandusemethodologiestoorganizethecultural change process. We have found that, unlike typical change management, cultural change management is protracted, and requires alignment with leadership to drive desired culture and define change strategy. As shown in Figure20,theculturalchangecouldbeachievedthroughtailoring the transition approach and moving different types of stakeholders through cultural and behavior dimensions. Staying the course is critical to culture change.

Ourresearch,derivedfromourclientexperience,indicatesthat inadequate leadership sponsorship accounts for a major reason why large transformation projects fail. Effectivetransformations,therefore,havesolidsupportfrom the executive leadership team. In each effective business case, the cultural shift was treated as an importantorganizationalpriority,withcleargovernance,policies, and people in place to facilitate the change. The change management driven by the TRI solution program operating model should be designed to use a set of people-oriented strategies, tools, and techniques that, whenappliedatthescientificgroupandorganizationallevels, will determine that key scientific users are ready, able, and willing to accept and implement changes to how theystoreandanalyzeresearchdata.Itisanticipatedthatthe scientific community will then perceive it as an ongoing requirement to achieve future success in their research.

Implementing the vision for Translational Research

“…major culture change does not happen easily or quickly … even with excellent leadership at the top, major change requires many initiatives from many people, and that simply requires time, often lots of it.”

John KotterKonosukeMatsushitaProfessorofLeadership,

EmeritusattheHarvardBusinessSchoolLeading Change,HarvardBusinessReviewPress,1996

Figure 20. The path to culture change


Current culture

Ideal researchculture

Uncertainty CommittmentBehavior dimension

Cultu

re d

imen

sion

End users

Managers

Leadership team

Goal

Use of TRI solution as aninterdisciplinarybioinformatics resource

The path includes aligningleaders behaviors with desired culture, de�ningand communicating strategy


User adoption — A key to culture changeWe envision three equally important avenues that the TRI solution program leadership should consider pursuingconcurrentlytoimproveadoption(Figure21).These include improving the user’s experience while leveragingtheTRIsolutioncapabilities(“userexperience”),establishing data standards to improve data interoperability and managing it effectively to store and retrieve it for analysis(“datalifecyclemanagement”),andmarketingTRI solution as the authority in translational research and relateddatawhileprovidingincentivesforusage(“userincentives”).

User experience: The TRI solution experience will eventually decide whether users will be back on the portal. To drive adoption, the TRI solution portal should be creative, yet scientific, with the right navigational components in place. It should be exciting and fun to use. The scientific

workbench and portal proposed earlier allows an intuitive encounter with a variety of tools promoting user experience and appeal. By thinking through innovative methods and conducting ongoing user simulations, administrators could improve the user experience and keep the user “addicted” to the TRI solution portal. The other aspect of improving the user experience would include proactively maintaining and enhancing user-aligned platform capabilities, such as increasing the number of toolsets and resident datasets and designing usable workflows. A framework, such as the one proposed for the TRI solution portal, could be used to drive enhancements with user input and preferences.

User incentives: TRI solution needs to offer users incentives to come back to the portal, and to do this, it has to go beyond the more traditional research portals, and become the go-to research platform for the scientific community. Some features that will make the TRI solution portal attractive to users are:

•Asanauthoritativesourcefortranslationalandrelatedresearch data: By housing or providing access to an exhaustive set of research data that offer insights in research topics, by providing reference frameworks, such as controlled vocabularies and ontologies, and by creating research-specific data management tools, the TRI solution portal can become the authoritative source for research data

•Asacatalystforcollaboration:ThesuccessofCytoscape,an NIH initiative that attracts industry collaboration and investment, is well known. With careful implementation, a TRI solution portal solution has the chance to repeat this success

•Asapublishingplatform:Withthewealthofdatasetsand analytical tools and a diverse collaboration community, the TRI solution portal, can position itself to support and publish, large-scale scientific studies

Datainteroperability:Definingandharmonizingstandardsthroughout the data life cycle is essential for the operation of a scientific community. It can allow different information systems and communities to “speak the same language” and work together technically to manage and use consistent, correct, and useful scientific information.

Figure 21. Three-pronged approach to engage adoption of TRI solution by the scientific community


User experience

User incentives

Data interoperability/

data life cycle management

TRIsolution

adoption

24

Program management — An ongoing processGiven that TRI solution program would need to overcome numerous challenges to achieve future-state readiness, managing the steps for implementing such a platform requires a multidisciplinary program management structure that can provide technical and scientific knowledge to enhance the bioinformatics activities within the scientific community. As noted above, our experience and the program requirements suggest that administrators of the program should establish an operating model governance structure to move implementation through the maturity modelframework.Overcomingchallengeswillrequireduediligence with current and potential partners and changing the existing culture within the scientific community, which prides itself on secrecy and competition amongst scientific investigators, to one that is more open, and enables sharing and collaboration. A well-defined program structure,asshowninFigure22,couldserveasthetransformational operating model for implementing such a program and improving capabilities to integrate data. It also could improve interoperability among disparate systems that can be used by the scientific community to generate further innovations.

Theorganizationmodelisillustrativeofhoweffectiveorganizationshavetypicallystructuredthemselvestobringtogether the necessary governance and management to facilitate planning, coordination, and implementation of transition-related activities. As noted earlier and throughout this paper, it is evident that efficient therapies would be based on combinatorial analysis of often-disparate data displaying stage-of-disease progression, as well as a patient’s medical and individual characteristics. Therefore, it is crucial that the operating model is represented by skilled practitioners that can influence decision making and have leadership capabilities to drive change at the necessary levels of the research community. Thus,atthetopofthisoperatingmodelistheConsortiumLeadershipBoard.Itwouldbemadeupofprincipalinvestigators, research area leads, TRI leadership, and an SMS panel.

TheConsortiumLeadershipBoardwouldhelptoaddressIPrightsanddataprivacyandsecurityissues,anditwould work with TRI leadership and the TRI solution technical team to understand program progress and to drive visibility into significant program decisions, risks, and opportunities — effectively bringing the larger power and collective experience of consortium representatives to bear. Data management and enhancement activities

Figure 22. A consortium-based TRI solution operating model


Consortium leadership

Principle investigators Research area leads TRI leadership Cross-scientific SMSsResearch analytics SMSs

Scope of responsibilities

IP rights Incentives Semantics Data protocols Security Alignment on standards, capabilities, and technologies

Dynamics of contribution

Repository data management

Project-based workgroups with clear roles and responsibilities

Supporting platforms and/or technologies

Collaboration andoutreach

Quality controland documentation

Analytics and visualizationtools

System and infrastructure

Projectmanagement

Training andassistance


would be managed by project-based work streams. The SMS panel would be represented byindustryspecialistsfromdifferentdisciplinaryfields.Suchapanel(Figure23)isneededtobringforthfreshperspectivesfromtheirrespectivedomainsofspecializationtoresolveissues that may impede data interoperability and collaboration.

The project management leadership team would maintain direct communication with their workstream team members. It would directly discuss day-to-day operational matters and provide feedback without having to go through an extra communication layer. It would bring the necessary governance and management framework to facilitate that planning, coordination, and implementation of activities are carried out in a coordinated manner.

Figure 23. The SMS panel provides fresh perspectives


SMSs

Ontologies lens

Health outcomes lens

Bioinformaticslens

Cheminformatics lens

Translational research lens

26

It is not easy to prepare for what the future holds. No one has a crystal ball, and no strategy is perfect. The journey ahead will demand smart investments, flexibility, agility, and change management.

Across collaboration, data management, and technology focus areas, this whitepaper highlights leading practices to facilitate implemetation of a TRI solution, In the relevant sections of this whitepaper, the following hasbeenoutlinedforconsideration:(i)developinganoperating model to support and maintain a TRI solution; (ii)importanceofintegratedanalysistoolsandalgorithms;(iii)providingsystemarchitecturestrategy;(iv)integratingvariouscontentdatasourcesforend-to-endanalytics;(v)enriching the usefulness of content through ontologies andstandardvocabularies;and(vi)improvingdatainteroperability.AsoutlinedinFigure24,makingchangesto these focus areas can be accomplished in a phased manner designed to facilitate implementation of a TRI solution as an accelerator of collaboration in the big-data environment. From a technical perspective, this provides a structured path to implement such a platform.

The strategy can be implemented via a systematic three-stepapproach,describedinFigure24.Inthefirststep, the team should consider working collaboratively with members of the scientific and biomedical research communities to fully understand the current state of TRI and gain alignment within the consortium leadership board, SMS panel, and select user community members on its restructuring needs. The second step involves creating a clear implementation roadmap outlining the required activitiestorealizeidentifiedopportunities.Thelaststepis

execution of the roadmap, including the communication andculturalchangemanagementstrategies.Conductinga detailed assessment of needs and targeted planning approach will support implementation of such a large and transformational solution.

The data and technology landscape are very dynamic and so are the forces that shape the biology world. The planning exercise should be conducted by a visionary team that has the scientific, technical, and operational knowledge in the areas of scientific research, bioinformatics, data management, and IT infrastructure. The skills and experience of this diverse team will enable theeffectiveimplementationofaTRIsolutiontorealizelong-term strategic vision and offer an innovative engine for the scientific research. No plan can be set in stone — the TRI solution implementation program will need ongoing evaluation and course correction to align its plans with the trends in the healthcare ecosystem and help achieve the future state vision. Nevertheless, the recommended investments in implementing such a platform will speak for themselves — the return on investment could be self-evident for stakeholders involved, attracting investments from external collaborations and powering the TRI solution to be a truly collaborative solution for the “big-bio-data” landscape of the future. Is this utopia? Yes. But it can be within reach.

Conclusion

Assess currentstate

Define roadmap for implementing improvement

opportunities

Implement improvement opportunities

Identify restructuring needs and improvement opportunities around three pillars — bio-data management, technology, and collaboration

Create a clear roadmap for implementing identified opportunities with tactical, programs, and projectsoutlined

Implement roadmap:• Develop collaboration

operating model• Leverage scientific leading

practices, redesign architecture, data governance, etc.

• Redesign technology choices and direction

Figure 24. A work plan for implementing a TRI solution



AuthorsTunc TokerSenior [email protected]

LukeDunlapSenior [email protected]

AsifDhar,MD,[email protected]

AcknowledgmentsThe authors wish to thank the following collaborators for their contributions to this paper:

RobertDecker,ThomasBreuer,RichCohen,NitinMittal,KumarNagarajan,DavidCroft,NinaTatyanina,andChristopherComrack.

SanjaySrivastava,[email protected]

ParagAggarwal,PhDSpecialist [email protected]

LeeAnnBailey,PhDSpecialist [email protected]

SanthaRamakrishnan,PhDSeniorConsultantDeloitteConsultingLLPsantramakrishnan@deloitte.com

SurenDheenadayalanManagerDeloitteConsultingLLPsdheenadayalan@deloitte.com

YvettePalmerSpecialist [email protected]

Acknowledgments

AboutDeloitteDeloittereferstooneormoreofDeloitteToucheTohmatsuLimited,aUKprivatecompanylimitedbyguarantee,anditsnetworkofmemberfirms,eachofwhichisalegallyseparateandindependententity.Pleaseseewww.deloitte.com/aboutforadetaileddescriptionofthelegalstructureofDeloitteToucheTohmatsuLimitedanditsmemberfirms.Pleaseseewww.deloitte.com/us/aboutforadetaileddescriptionofthelegalstructureofDeloitteLLPanditssubsidiaries.Certainservicesmaynotbeavailabletoattestclientsundertherulesandregulationsofpublicaccounting.

Copyright©2012DeloitteDevelopmentLLC.Allrightsreserved. MemberofDeloitteToucheTohmatsuLimited

translational research informatics whitepaper

Documents