translational research informatics whitepaper
TRANSCRIPT
The scientific information landscape and applications to Translational Research
This publication contains general information only and Deloitte is not, by means of this publication, rendering accounting, business, financial, investment, legal, tax, or other professional advice or services. This publication is not a substitute for such professional advice or services, nor should it be used as a basis for any decision or action that may affect your business. Before making any decision or taking any action that may affect your business, you should consult a qualified professional advisor. Deloitte shall not be responsible for any loss sustained by any person who relies on this publication.
This whitepaper includes data and information that shall not be disclosed outside of the intended audience and shall not be duplicated, used, or disclosed — in whole or in part — for any purpose other than consideration of this whitepaper. The intended audience may not rely upon its contents for accuracy or completeness nor to use it to formulate official policy or official decisions. The intended audience may consider its contents “as-is” without any warranty of quality. In no event shall any part of this whitepaper be used in connection with the development of specifications or work statements with respect to any solicitation subject to full and open competition requirements. This restriction does not limit the intended audience’s right to use information contained in these data if they are obtained from another source without restriction.
The scientific information landscape and applications to Translational Research iii
Table of contents
Contents i The scientific information landscape and applications to i Translational Research 1 Executive summary 2 Creating a vision for Translational Research 5 Building blocks of Translational Research investments 22 Implementing the vision for Translational Research 26 Conclusion 27 Acknowledgments
The scientific information landscape and applications to Translational Research 1
Executive summary
The following are common challenges observed when adopting a translational research informatics strategy:•Complicationswhenimportingdatafromvariousdata
sources — data transformations that are required to import data into internal, structured data repositories can be resource intensive, requiring manual data entry and creating surrogate keys to link data files
•Inconsistenciesindatadictionarystandards—guidanceontheuseofstandards,suchasClinicalDataInterchangeStandardsConsortium(CDISC),InternationalClassificationofDiseases,ninthrevision(ICD09),orInternationalClassificationofDiseases,10threvision(ICD10),andinternaldefinitionsanduseofstandards is unclear; internal standards documentation is not current with the changing environment and data-importing processes do not incorporate these standards; standards are often not fully adopted
•Impedimentstointeroperabilitywithexternalpartners and cross-trial analysis — lack of a uniform implementation and enforcement of data dictionary standards during data acquisition and submission; i.e., there is no requirement to submit clinical or mechanistic datausingCDISCorotheracceptedstandards;partnersmay not uniformly adopt the same standards
•Datasharingishinderedbydifferinginformaticsandanalytic tools — sharing of research data is not fully provisioned in the contracts leading to questions on the extent and scope of data that can be shared, resulting in incomplete and outdated data, unsupported with sufficient metadata; patient-derived consents for sharing clinical trial data is limited
•Dataqualityconcerns—currentvalidationfocuseson completeness of submission and not on reliability, translation, or integrity of the data content
•Variousdataanalysisandvisualizationtoolsareused—use of tools are not consistent nor standard, resulting in partners establishing their own research collaboration portals
•Patientprivacyrightsissues—informedconsent
Investments made in biomedical and translational research have led to advancements in providing insights into the mechanisms and disease relevant markers that are implicated in human pathologies. This advancement has led to innovative approaches in designing basic research and clinical trial strategies that are more effective in translating the discoveries made at the bench to treatments delivered to the bedside. In order for these advancements to be truly effective, traditional and coveted research silos should be broken down, making these discoveries more available so they may be shared and leveraged across the biomedical research community. This is expected to require a more open and collaborative environment in which privately and publicly funded researchers, primary investigators, and clinicians work together through a knowledge exchange where they share and leverage their collective discoveries and insights in designing new, novel, and innovative approaches to treating human diseases.
With the advent of this investment in biomedical research, technologies, and collaborative networks comes an explosion of data; data derived from many different sources, structuredindistinctlydifferentformats,analyzedusingdifferenttools,andinterpretedfromdifferentperspectives.Atranslationalresearchinformatics(TRI)strategycanbeusedtohelpovercomemanyofthesechallengestohelprealizethepotentialthisinformationbrings to the advancement of new, novel, safe, effective, and innovative treaments for human diseases.
TRI is the practice and critical cornerstone in providing a platform on which to bring these vast amounts of basic scientific and biomedical research data together in a cohesive and structured way to perform meaningful and intelligible analyses to drive clinical success. ATRIsolutioncanserveasavenuetosubmit,archive,exchange,andanalyzescientific,bioinformatic, and medical research data, addressing data standards, data interoperability, and solutions to ethical issues around data sharing. Data acquisition and interoperability across multidisciplinary research areas, extensible collaboration portals with standard andbroadanalysisandvisualizationtools,andinherentculturalcollaborationconcernsin a competitive scientific community are key causes preventing the full adoption of TRI solutions. It is critical to identify key enablers to reduce these barriers to envision and realizetheinformationlandscapeofthefuture.
In this paper, we have set the context for an anticipated future vision of the postcommoditizedgenomicresearchenvironment,whereTRIsolutionscanbeimplemented to operate to succeed. We have outlined an approach to acquiring and storing biomedical data from several sources, implementing and enforcing data standards and interoperability, integrating the data and anchoring them around exhaustive controlled vocabularies and ontologies, creating and making available tools for analysis of heterogeneous data sets, and providing an appropriate supportive technology base to enable these functions. Implementing these changes can help determine the smooth operation of a TRI solution, fulfilling the scientific needs of the community, increasing theappealoftheplatform,andcatalyzingadoptionandcollaboration.Ourpaperdescribes specific execution measures involving strategy formulation, careful planning, multithreaded execution, and an ongoing change management as a means of translating ideas to action, thereby increasing a TRI solution’s usage, and making it an invaluable collaborative resource. We believe that the approach we lay out can be widely adaptable across commercial industries, academia, and governmental agencies such as the National InstitutesofHealth,FoodandDrugAdministration,CentersforDiseaseControlandPrevention,DepartmentofEnergy,EnvironmentalProtectionAgency,aswellasothers.
2
Creating a vision for Translational Research
A systems biology approach for disease etiologySuccessful cures for diseases remain comparatively rare.Pathologicalprocessesofdiseaseinvolvemultiplepathways, cells, and mediators, and are influenced by a variety of genetic and environmental risk factors. There are significant variations in disease manifestation and prevalence across global geographies and human genomic makeup.
The wealth of experimentation techniques over the past few decades of biomedical research has advanced our empirical knowledge of biology and left scientists and clinicians with tremendous amounts of data and information. Indeed, a large portion of experimental data is derived from high-throughput experimentation such as genomics,transcriptomics,andmetabalomics(collectivelytermed“omics”data)whichgenerateseveralgigabytesofdata per experiment. We are, however, severely limited by ourabilitytointerpret,analyze,andsynthesizeknowledgefrom these datasets, which we term as “big-bio-data.” The scientific community, therefore, desperately needs data management accelerators and knowledge-exchange
platforms that allow the interpretation analysis and exchange of big-bio-data. TRI solutions that incorporate these capabilities and unlock the potential of big-bio-data can be pivotal to the advancement of scientific innovation.
A broad TRI solution should provide a forum to enable an interdisciplinary approach that draws from data of many types; e.g., genomic, genetic, cellular, molecular, and physiological, and from many related research areas. AsdepictedinFigure1,TRIsolutionsshouldemployaninterdisciplinary approach to providing targeted insights in the specific fields of research while also connecting the data from other human systems, thus opening a window into systems biology.
“It is a very sad thing that nowadays there is so little useless information.”
Oscar Wildein“AFewMaximsfortheInstructionoftheOver-Educated,”
Saturday Review(17November1894)
Figure 1. Systems biology — A connected approach across contained systems
Source:DeloitteConsultingLLP
e.g., Metabolic systeme.g., Nervous systeme.g., Immune system
Proteases
mRNA
Antibodies
Regulatory motifs
Exons and introns
Phagocytes
DNA
tRNA
Ion pumps mRNA
Neurotransmitters
Regulatory motifs
Exons and introns
Synaptic molecules
DNA
tRNA
Motor axons
Glucagonand insulin
mRNA
Citric acid cycle/ molecules
Regulatory motifs
Exons and introns
Metabolites
DNA
tRNA
IsomerasesConvertases
Defensins
Glial molecules
A robust TRI solution must generate insights from information across human systems enabling a systems biology approach.
The scientific information landscape and applications to Translational Research 3
Craftingaclear,dependablelensforsystemsbiologythrough big-bio-data management is a difficult task. There are many challenges; the most fundamental is the challenge of pulling together the exabytes of data that span the biology universe, followed by the challenge of drawing actionable scientific and medical conclusions from it. Addressing these challenges requires consideration of significant investments in a variety of capabilities, a paradigm shift in the way life science, research, and healthcare stakeholders think about collaborations, and theengineeringofanoperatingmodelthatincentivizesthescientific community to collaborate fruitfully while retaining the rights to discovery and intellectual property.
Mega-collaborations — The path to systems biologyStakeholders in the biological data arena are investing inlocalizedcapabilitiesthatpulltogetherbasicresearch,clinical trial, and genomic data through TRI solutions. Such efforts across drug and medical device makers, care providers, research centers, federal government, nonprofit organizations,andacademiaareaimedatdevelopingadata-poweredapproachtosystemsbiology.ExamplesincludetheNationalCenterforAdvancingTranslationalSciencesinthefederalspace,theEmergeNetworkinacademia,thePistoiaAllianceforthecommercialsector, and the Innovative Medicines Initiative for public-private research. It is expected that this trend will greatly increase over the next few decades, creating hubs of sector-specific collaboration that later morph into well-organizedmega-collaborationhubsthatoperateacrossthe sectors (Figure 2).
Figure 2. Interconnectivity of collaboration portals are required to support the scientific landscape of the future
Source:DeloitteConsultingLLP
TRI solutions will need to address these key themes and the
interconnectivity of the biomedical landscape
Cross-disciplinary research
Dense knowledge traffic
Big picture insights/ systems biology
Bench-to-bedsideand back
Exabyte global data management
Global talent management
Cloud R&D
IP rights, privacy,and security
Oncology
Neu
roCa
rdio
Respiratory
Immunology
Infectious
Neuro-
scien
ce
Platform
RNAi
Cardio-
vascular
Med
ical device
maker
Orp
han
drug
mak
er
Specialtypharma Other
examples
Vaccinem
aker
Drugmaker
Commercial BioPharma companies
Universities
Non-profits
Care providers /
sites of care
HuGE
Pato
gen
Kbas
e CTR
CPDB
HERO
WONDER
BTRISImmPort
BMIS
CDC
NIHOther
examples
FDA
EPA
LAN
L
Navigator
Government funded and managedD
OE
4
A collaboration model can be the key to closing the loop between the activities on the bench and those on the bedside, thereby providing benefits of accelerated molecular discovery, improved drug development, increasedpatientsafety,andtargeted,personalizedpatienttherapies.Effectiveoperationofsuchhubsrequiresthatentities consider scaling exponentially while addressing the concomitant challenges of data, technology, infrastructure management,intellectualproperty(IP)rights,andthechallenges of collaborative data sharing and knowledge exchange. There are three capabilities critical to the operating model for such initiatives — data management, technology, and collaboration. They are not new concepts; however, providing these capabilities with very specific direction can help transform TRI solutions to knowledge-generating centers:
•Datamanagement — Making it specific for biomedical data: Incorporating relevant data standards, controlled vocabularies and ontologies, algorithms, methodologies and analytical tools, and engineering better processes of data governance and sharing can allow better data interoperability, insight generation, and collaboration in the research community
•Technology — Letting it support the data management needs: Focusing on scalable, flexible infrastructure, and cost-effective yet high-quality technologies for data storage and computing can enable better handling of the several million bytes of data and accelerate the pace of collaboration
•Collaboration — Enabling the technical necessity: Communitiesareenabledbycollaborationandcollaboration enables communities; hence, developing an enhanced data sharing and exchange model that is enabled by the data and technologies, as well as an operating model for collaboration that makes evident thebenefitsofknowledgeexchangeandincentivizessharing as a critical driver to the adoption
Tailoring these commonplace activities to help meet the specific needs of a research community can be a game changer for TRI solutions. In subsequent sections, we discuss in detail each of these capabilities.
The scientific information landscape and applications to Translational Research 5
“Where is the knowledge we have lost in information?”
ChorusesfromTheRock(1934) Eliot,T.S.(1934),TheRock,London:Faber&Faber
Figure 3. A state-of-the-art data management approach
Source:DeloitteConsultingLLP
Building blocks of Translational Research investments
Data management — Make it specific for biomedical dataThe scientific research landscape of the future will likely be one that is knowledge driven. Bringing together laboratory results obtained from early discovery to clinical research with clinical trial and patient outcomes will be key to deriving greater insights and knowledge on health and disease. For TRI solutions, this implies the ability to acquire,share,manage,andanalyzedataobtainedfromthe collaborators within portals along with public domain information contained within publications and publicly availabledatasets.Providingforthesecapabilitiesrequiresasoundbaseindatamanagement.Figure3describesthe elements of a state-of-the-art data management frameworkasitappliestoanorganization,whichdealswith several collaborators that both produce and need access to internal and external data.
The data management challenges of systems biology are complex, due to its highly interdisciplinary nature and strong dependence on high-throughput experimental techniques that provide a large amount of data. Meeting these challenges requires that the system allows for the rapid introduction of new data sources derived from new and emerging technologies, for interoperability between datasets and analysis tools, and for a sophisticated means to integrate data sources to support data mining and searching operations. However, typical implementations based on classical enterprise and business intelligence systemswillnotworkforthishighlyspecializedscientificdomain.Adatamanagementapproach(Figure3)thatis tailored to the needs of the relevant biology and biomedical data requires consideration of Data Acquisition, DataIntegration,andDataPresentationandExchangecapabilities to handle the data sharing and exchange needs of TRI solutions. An overarching Data Governance Framework, grounded in the leading practices of data security and interoperability, will likely facilitate the effective movement of data through the data management pipeline.
Data Acquisition
Data sources Data stagingData acquisition
Data standards and naming conventions
Data Presentation and Exchange
Data search Data exchangeData analysisand mining
Data visualization
Data Integration
Data modelsData
transformation Reference data
Metadata management
Data governance and ownership
Flexible, scalable, reliable, and fast
Data securityData interoperability
6
Figure 4. Data acquisition components of the data management approach
Source:DeloitteConsultingLLP
DataSources — They matter: As a portal with the goal to provide valuable insights to its researchers by using an interdisciplinary approach, TRI solutions must enable access to a broad and exhaustive collection of datasets and reference data from a wide variety of biological domains and sources. In addition, datasets from the collaborating programs should be made available to powerthiscollaborativeresearch.Controlandincentivemechanisms, as well as a cultural change, are essential to motivate participating members to submit their data into secure repositories so that TRI solutions can serve as both the archive and exchange portal for experimental data generated under its aegis. With respect to the data sources available in the public domain, these are continuously changing. To keep abreast of the changing landscape, there should be an automated means of monitoring for new data sources, as well as a systematic community outreach mechanism to update the inventory sources required for incorporation. The value of data about the datacannotbeoveremphasized.Metadataiscriticaltothe scientific process where the value of a data is defined also by the method of obtaining it. For a portal to provide insight generation, the emphasis on acquiring and storing the metadata is critical. It also enables the development of applications to further aid the scientist and make the language of informatics more easily understood.
DataAcquisition — Automation is key: In order to keep abreast with the pace of research, significantly automating data acquisition processes should be considered. A determination as to which datasets should be incorporated into the portal’s data stores and which can be accessed by referencing will be necessary. For the former, there will need to be mechanisms for automatic feeds that keep the data stores current and in sync with the repository. For the latter, it means semantically interoperable referencing requiring some manual processing to upload data. Regardless of the mode used for data acquisition within the portal data stores, manual or automated, comparing the data against standards is critical for further use and analysis. Automated scripts to check for compliance with such standards and for conversion to standards implemented within the portal are critical to the effectiveness of this process.
DataStaging— Better staging, better performance: Good design principles should provide for a staging area as a place for temporary storage for data that is sourced. A staging area, distinct from a presentation layer, and one that takes into account the variety as well as the uniqueness of data types encountered and the relationships they bear to each other, can facilitate improved downstream processing, culminating in better insights.
Data Acquisition
Vision
Data Sources Data Acquisitionand Validation
Data Staging
Potentialbenefits
Potentialbenefits
Clinical data
Research data
Laboratory data
Reference data
Metadata
Multiple private and public sources
Automated procedures
for data validation
User provided direct submissions through
templatesStaging schema
Direct data feeds via ODBC, web services,
plug-ins, other)
• Inclusions of other recognized reference and research data
• Direct data feeds from public sources on a periodic basis using pull/Web services
• Creating a well-defined staging area leveraging bio-datawarehousing effective practices
• Ability to provide users with more relevant and specific context to the interpretation of experimental data
• Enhanced acceptance of a wide range of data sets
• Ability to provide users with a variety of public data
• Improved user adoption, easier to upload data
• Effective historical tracking• Support for data curation, algorithm,
and methodology management to transform data to knowledge
Data Integration
Effectiveprocess
Effectiveprocess
Vision
DataAcquisition—MakeitscalableFigure4demonstratestheDataAcquisitionphaseofthedatamanagementprocess.Threeaspectsoftheacquisitionprocess are critical to enabling collaboration in the forever-changing data landscape — Data Sources, Data Acquisition andValidation,andDataStaging.
The scientific information landscape and applications to Translational Research 7
Figure 6. Example biological datasets
Potentialbenefits
Potentialbenefits
Data Integration
Effectiveprocess
Effectiveprocess
Vision
Data Staging
Staging schema
• Creating a well-defined staging area leveraging bio-datawarehousing effective practices
• Effective historical tracking• Support for data curation, algorithm,
and methodology management to transform data to knowledge
Data Integration
• Integrated data model (example options: Highly denormalized generic storage, such as i2b2 or snowflaked normalized model. Exact design will depend on further analysis)
• Easier to standardize for future data collaborations
• Improved efficiency in knowledge generation
Research data
Clincial trial data
Other
Integrated data model
Reference data Ontologies
Information Presentation
• Redesigning the information presentation layer to accommodate complex methodology management, algorithm management, and data curation capabilities
• Increased user adoption• Improved scientific research capabilities
Met
hodo
logy m
gmt
Algorithm mgmt
Data curation
Data marts
Materialized views
Bio-cubes
Data Presentationand Exchange
Vision
Met
hodo
logy m
gmt
Algorithm mgmt
ETL
Figure 5. Data integration components of the data management approach
Source:DeloitteConsultingLLP
Source:DeloitteConsultingLLP
Dataintegration—Differentapproaches,differentgainsData integration is critical to the synthesis of knowledge from data. Activities in this phase combine the data to facilitate their delivery to the tools and methods that users access to interpret the information and infer knowledge. Figure 5 conveys the activities that are important at the data integration stage.
Ontologies and controlled vocabularies — Critical for insights: The new ecosystem consisting of multiple research centers addressing several related topics has created numerous sources of data that most likely speak of the same biological objects, processes, and observations in different ways. An example of the potential types of datasets encountered is mindboggling and represented in Figure 6.
Functionalgenomics
QSAR models
System biologymodels
Proteomics
Medicinalchemistry
Compoundlibrary
PharmacogenomicsMetabolomics
Clinical genomics
Metabolomics
AE/SAE data
BiobankPopulationgenomics
SociomicsComparative
genomicsToxicology data
PK/PD data
Animal models
HTS data Efficacy data Economics data
Research and pre-clinical Development Commercial
8
Figure 7. Data model options
Approach Description Pros Cons Example
One integrated data model housing many types of biomedical and/or clinical data
Highlydenormalizedgeneric data storage layer
•Highly generic model does not require ongoing changes
•Integrated clinical, genomic, research, and public reference data
•Easiertostandardizeacrossconsortia,facilitating collaborations
•Requires heavy up-front ETL/datapipelinemanagement
•Queries get very complex; hard to extract scientific insights
•The data model of the National Institute of Health (NIH)BiomedicalTranslational Research Information System
•i2b2platform
Integrated data model with snowflake sub-models
Highlydenormalizedgeneric data storage layer augmented with normalizeddomain-specific models
•Integrated clinical, genomic, research data•Lessstressoninfrastructurecomparedto
the single data-model approach above •Easiertoquery,betterorganizeddata
domains
•More models to manage, more difficult to change compared to the single data-model approach above
Deloitte Health Insights andInformaticsPlatform
Delineationofclinical and “omic” data models
Onedatamodelforclinical data, one for “omic” data
•Twospecializedmodelsforclinicalpatientdata versus “omic” data
•Glove fit for certain scientific applications
•Hard to link two models for querying
•Not all data can be clubbed in two
OracleEHATranslationalSuite
One integrated data model with limited,selectdatafocusing on specific disciplines
Oneintegrated,butspecializeddata model — i.e., oncology
•Highlydisciplinedspecializedmodeleffectively addresses user needs
•Queries are intuitive and fast — supports superb ad hoc data retrieval
•Difficult to scale to additional domains
•Difficult to retrieve end-to-end insights for petabyte-grade bio-data universe
MoffittCancerCenterResearchExchangeHub
Source:DeloitteConsultingLLP
The desire to obtain insights from data that has been acquired from different experimental approaches and settings can be accomplished by determining the equivalence and relationship of concepts involved. Controlledvocabulariesprovidethatanchorofequivalenceand ontologies help establish relationships between these concepts.Ontologies,therefore,arecriticalasframeworksfordataintegration.Ontologiespowertheabilitytoderive new hypotheses from a limited set of preliminary observations.Academiaandstandardsorganizations,as well as the industry, are joining forces to develop standards for vocabularies and ontologies that will unlock the potential of research datasets. Furthermore, the data available for analysis must be structured around these ontologies in order for these ontologies to deliver their full potential. Together this will enable more powerful querying and interpretation of heterogeneous datasets, creating information assets that may be used and reused by the broader research community.
Datamodels — Tailored for biology: In addition to frameworks, such as ontologies and controlled vocabularies, the appropriate data models should be considered to house the data to enable efficient querying andretrievalofcorrectresults.Commonlyobservedrepresentations of biological data range from one data model to several data models linked to each other. Figure 7outlinesdatamodeloptions,theprosandconsofthese,andexampleswheretheseoptionswereapplied.OthernewertechnologiessuchastheW3CWebOntologylanguage(OWL)andtheresourcedescriptionframework(RDF)areusedtolinkdataonasemanticbasisandlaythefoundation to provide enhanced insight generation. Hybrid approaches overlay semantic methods on traditional data stores and allow for “integration” and “inferencing” with data stores in place. The enhanced solution depends on an in-depth assessment of the data stores and the needs of the research community. Biological data are extremely complex to model due to the networked relationships that the participating entities bear to each other; therefore, simple relational diagrams are inadequate for this purpose. In addition, the rapidly changing data landscape in biology makes the case for a data model that is scalable, extensible, and flexible to accommodate new data sources.
The scientific information landscape and applications to Translational Research 9
“If it takes 3 days to get an answer I’m not going to ask another question.”
Marc ParrishVicepresident,BarnesandNoble;speaking
attheGigaOmConference,2011
Datatransformation— Preparation for analysis: In order to present data to analysis tools, data must be prepared in a manner that allows for querying. Data curation, the first ofthesesteps,isasemi-automatedprocessthatorganizesthe data into a standard format and facilitating the reliability of the data. Algorithm management, the second step, prepares the data for faster querying. Algorithm management that is typically done using a standard set of
business intelligence tools must be handled in ways that address both the performance challenge and the domain requirement.ExamplesofsuchalgorithmsthatarerelevanttoTRIsolutionsincludenaturallanguageprocessing(NLP)engines that extract facts from publications and present them for querying, or those that prepare data for common statistical analysis.
Datapresentationandexchange—Manyusers,many needsThe scientific community is diverse, and in this age of systems biology; scientists, informatics professionals, and physicians are looking for information in the same data stack. They are seeking answers to different problems or different facets of the same problem and need to approach the data in different ways. They ask different questions and ask the same questions differently.
Empoweringsuchadiverseaudiencerequiresexcellentsearchandqueryanddataanalysisandvisualizationcapabilities. These tools and aids need to both reduce the complexity of the analysis that needs to be done as well as allow powerful insights to be generated from disparate datasets.AsdepictedinFigure8,themainactivitiesofthedata presentation and exchange phase are providing for data analysis and dissemination of results.
Figure 8. Data presentation and exchange components of the data management approach
Source:DeloitteConsultingLLP
Potentialbene�ts
E�ectiveprocess
Data Presentationand Exchange
Vision
Data Search and Analysis
• Integrate more open source tools (i.e., Cytoscape for pathway visualization)
• A services-oriented architecture (SOA) allows the incorporation of varioustools and the creation of a pipeline-based work�ow
• Increased user adoption• Selection of tools for scientists
accelerating research activities• Harnessing the tools from other
programs• An easy, intuitive work�ow
Gene expression toolsFlow cytometry analysis tools
Visualization toolsOther related analysis tools
Metanalysis toolsUser work�ows
Data Exchange
• De�ne data exchange protocols and recommend solution design to push data out
• Support for future collaborations • Support for large scale data
dissemination
Data standards
Exchange protocols
Information Presentation
• Redesigning the information presentation layer to accommodate complex methodology management, algorithm management, and data curation capabilities
• Increased user adoption• Improved scienti�c research capabilities
Met
hodo
logy
Algorithm mgmt
Data curation
Data marts
Materialized views
Bio-cubes
10
Information presentation — Specialized data marts for downstream processing: Biological data has several subdomains each with its individual characteristics. With the influx of data from several sources, it is necessary thatthecomplexbiologicaldatabereorganizedbydomainforfurtherdownstreamprocessing.Examplesofspecializeddatamartsincludegeneexpressionmartsor flow cytometry marts. The single largest benefit to this approach can be the enablement of the bottoms-up technique to assessing and refining the content and principles in each domain. This can allow a segmented approach to handling the rareness of different domains and their individual implications to the data model. These sophisticated techniques for transforming and presenting the raw data should be tailored to a domain and allow the rapid and relevant querying of information and generation of knowledge.
Datasearchandquery— Sophisticated yet easy: The addition of a broad set of data within data repositories will attract a larger and more diverse set of users to search
and query for information germane to their research. Two simple techniques that can enhance these capabilities areadhocandNLP-basedquerying.Theformerisagoodalternative to the constrained, restrictive queries that applications typically offer. Users can construct complex queries between data sources and data types using the graphicaluserinterface(GUI),supportedbyaneffectiveback-end query engine. Such an approach increases user appeal, due to the ability to query in a more naturally scientific manner. Natural language-based query engines are sophisticated translators of users’ queries that are developed to retrieve the right information. Both the natural language algorithms and the back-end of ad hoc queryfunctionalitywillrequirecustomizationtomatchthe needs of the biomedical domain, and incorporate the wisdom of domain-specific ontologies and vocabularies to power them. In addition, increased performance capabilities require efficient indices. There are several OpenSourceindexingtechnologiesavailable;onethathasdemonstrateditselfistheLucenetechnology.Incorporatingsuchtechnologiescanalsoremove/reducethe need to reproduce and store large datasets from other publicsources(e.g.,NationalCenterforBiotechnologyInformation(NCBI)),sincereferencingalonesuffices.
Dataanalysisandmining — Powerful approaches for powerful insights:Largernumberofparameterswithina dataset and larger numbers of datasets — this is the complexity of the data landscape of the future. Two critical necessities to empower the analytical approach are the ability to reduce the dimensionality of datasets and to allow inferencing over heterogeneous datasets. Foundational recommended analytical approaches outlined below are a rich metadata repository, a common standard vocabulary, and rich ontologies, alluded to before.
•Clusteringdatabyphenotypicattributes.Toolsthatsupport clustering and reducing the dimensionality ofdatasuchasGeneSetEnrichmentAnalysis,(GSEA),1 currently the most cited de facto standard for clustering and integrating data is a natural candidate forinclusionwithinaTRIsolution.GSEAprovidesameans of integrating mechanistic data from “omics” expression sets with other attributes, such as phenotypic or structural data, providing the first step towards meta-analysis.
“The value of information about information can be greater than the value of the information itself. … I am willing to project an enormous new industry based on a service that helps navigate through massive amounts of data.”
Nicholas NegroponteCreatorofthe$100laptop.speakingof
analyzingdatainWired,June1994
1GeneSetEnrichmentAnalysis,TheBroadInstitute(http://www.broadinstitute.org/gsea/index.jsp)
The scientific information landscape and applications to Translational Research 11
•Overlayingliteratureinformation.Researchdatapublished within the literature is a powerful aid to researchers. The development of natural language processing algorithms to mine text has substantially reduced the onerous task of reading volumes of literature, and instead offers the opportunity to serve uptextualinformationin“digitizedpieces,”whichcan then serve as pieces of data for analysis alongside experimentaldata.Pathwaysandnetworksderivedfromthesefactsarecommonplaceinbiologyresearch2andenable scientists to interpret experimental results in the context of these published facts. Simple yet sophisticated network approaches allow researchers to identify similar or contradictory results and correlate observations to seemingly unrelated phenomena. TRI solutions can benefit significantly from natural language processing (NLP)components.
•Toolsformeta-analysis.Researchwilllikelycontinueto depend on high-throughput experimentation and tools that interpret the results from several laboratories. Interpretingandanalyzingdatasetsobtainedunderdifferent experimental conditions from different labs requires sophisticated statistical techniques. Meta-analysis is the statistical analysis of heterogeneous datasetsforthepurposeofinterpretingandanalyzingthe combined set of findings. Investing in these methodologies is crucial to the ability of TRI solutions andtheiruserstohandleandanalyzedatasetsfromadiverse set of laboratories and collaborators.
2IngenuityPathwayAnalysis(www.ingenuity.com);PathwayStudio-AriadneGenomics/Elsevier;Metacore-GeneGo/ThomsonReuters
Figure 9. A network view from Cytoscape.3
Datavisualization— Window to insight: Working with large digital datasets to identify actionable insights will require powerful analysis paired with powerful display. Sophisticatedvisualizationtechniquesandalgorithms,including automated algorithms, can enable people to visualizepatternsinlargeamountsofdataandhelpthem unearth the most pertinent insights for a domain as complex as biology. A powerful open source tool that can quicklybringvaluetoTRIsolutionsisCytoscape(Figure9)—anopen-sourcesoftwareplatformforvisualizingcomplex networks and integrating these with any type of data. Although developed as an open source tool, Cytoscapehasseveralindustrypartners,suchasgenearray companies, sequencing companies, and standards consortia, which can contribute to its development. It is designed to integrate multiple ontologies and filter and present these datasets using these ontologies. IncorporatingCytoscapeasthevisualizationinterfaceof choice can allow users to view any given dataset in the context of pathways and scientific information, and explore correlations between datasets.
12
3www.cytoscape.org;Smootetal.Bioinformatics2011,27:431.
Open
Sou
rce
Prop
rieta
ry
ScientificInnovation
Collate Analyze Infer
terabytes pentaytes exabytes zettabytes
HadoopPostgress
ETL
Netezza
Teradata
ECL
Greenplum
MATLABSPSS
RapidMiner
IngenuityPathway
Physiolab
Cell Publisher
COPASI
Text MiningAlgorithms
Biology Data Analytics Continuum
OWLElixir
RDF
Oracle
Ensembl BowTie
SAS
OptGene
Pathway Studio
OligoStar
CytoscapeSciPyTopHat
R
Figure 10. Example tools for biological data analysis
Source:DeloitteConsultingLLP
Workflows — Employing a scientific workbench: The scientificprocessisiterativefromhypothesizingtoexperimentation and analysis. At each step, scientists typically use a variety of tools for querying, analysis, andvisualizationofdata,toassess/determinethattheirresults are consistent and their hypotheses are sound. TRI solutions can benefit from incorporating a multitude of tools to cater to the diverse needs of its audience. Incorporating open source tools should be considered as a cost-effective, yet reliable, option to help meet thisobjective.Figure10representsanexampleofopensourced and proprietary sourced tools to support the biology data analytics continuum which may be valuable to uncovering innovative hypotheses from several million bytes of data. Scientists also often have proprietary tools withintheirfirewallsthattheyutilize.Toolanddataintegration plans, as well as the workbench discussed next, should consider the need for interplay within and across open source and proprietary tools.
To effectively expand the use of a TRI solution portal to both scientists and informatics professionals, the portal must provide both technical confirmation and ease of use. Most scientists may not be trained to distinguish between the strengths and limitations of each tool, particularly with the more sophisticated statistical and bioinformatics tools and algorithms. Although incorporating several tools and datasets is necessary, it is important also to keep users engaged in using them. A solution for this lies in a workbench designed to guide the user through the necessary operations, the choice of tools available for their need, and allows them to operate seamlessly between the portal and their in-house proprietary tools. Intuitive workflows can be created by stringing together several tools, algorithms, and scripts, making the portal a valuable one-stopshopforanalyzingthedifferentdatasetsittouches.Figure11providesaconceptualdiagramofsucha workbench.
The scientific information landscape and applications to Translational Research 13
Inspect Interrogate Refinehypothesis
ValidateInquire
hypothesizeexperiment
Collate
Open source tools
Analysis/Statistical Visualization
Proprietary tools
Secure scientific workbench Proprietary workspace
– Bring experimental data sets together in secure environments
– Visualize data
– Use tools to analyze the experimental datasets
– Visualize analyses
– Analyze data against benchmarks
– Leverage analytical tools for multidimensional, heterogeneous datasets
– Use proprietary tools to analyze and validate results from scientific workbench
Metabolomics Cell biology
Biology Genetics
Experimental data
Genomics Proteomics
Publications data
Pipeline workflow
Analysis/Statistical Visualization
Ontologies and/or controlled vocabularies
Metadata
Clinical data
Pharmacology
ADME
Toxicology
Physiology
SOA architecture
Genetics
Figure 11. An intuitive workbench
Source:DeloitteConsultingLLP
Reusable workflows are essential to efficient and reproducible analysis of data.Figure12showsanactual implementation of a workbench that incorporated a multitude of proprietary and open source tools for gene expression analysis. Bioinformatics and scientists were presented with different user interfaces. Some ready-to-use workflows were implemented and provisions built in for customizingworkflowsbystringing together tools, scripts, and algorithms.
CREATE WORKFLOWS —String together methods,tools into a workflow andstore them for repeated use
COMMAND LINE or VISUALS — Choose depending on who youare — a bioinformaticsprofessional or a non-bioinformatics user
REFERENCE DATASETS — Access public and privatedatasets in a secure environment
DATA — Use intuitiveinterfaces to gather datafor analysis
1
ANALYZE — Use specifictools to analyze the data;e.g., Pathway Analysis ofGene Expression Sets
2
COMPARE — Statisticalanalysis; meta-analysis withheterogeneous datasets
3
VISUALIZE — Use publicand private tools touncover interactions
4
HONE IN — Discover detailsfrom scientific literature
5
Sample analysis workflow
Figure 12. An intuitive workbench for gene expression analysis
Source:PrototypedevelopedbyDeloitte’stechnologypartners
14
Figure 13. Data governance frameworkDatagovernanceanddatastandards—Amustforinteroperability With different datasets come different data types, eachwiththeirownstandards.Forexample,theCDISCstandards apply to clinical datasets; the Genomic Standards Consortiumstandardstogenomicdata;andtheMicroarrayGeneExpressiondatastandardsforgeneexpressiondatasets. A successful collaboration requires effective and reliable data translation between research groups, which then depends on data interoperability and data standards. A structured data governance program is essential to help achieve data interoperability and facilitate collaboration activities. Data governance can establish standards and practices to enable the translation, quality, and reliability of the information used to support critical decisions. Data governance team members are engaged throughout the datamanagementprogram.Figure13providesadatagovernance model example for supporting these efforts.
Three decision-making levels are needed to support this data governance model:
•Strategic—Theleadershipcommitteesetsthedata governance vision and is accountable to make certain a data governance program is established and supported. It provides guidance on funding and authorizesthedelegationofdecisionstothelowerlevelsof the governance model, which are responsible for implementing the data governance program.
•Tactical—Thetacticalcouncilsetsthestandardsandrequirements to facilitate data interoperability and collaboration. It will be composed of scientific, analytics, visualization,andstandardontologyandvocabularysubjectmatterspecialists(SMSs)todefinethestandardsand requirements for data sharing and interoperability (e.g.,ontologies,controlledvocabularies,andotherdatastandards derived from existing standards, such as the MedicalDictionaryforregulatoryActivities(MeDRA),CDISC,NCBI’sGeneExpressionOmnibus,etc.).Themembers at this level can be aligned to metadata, change, outreach, and the “omics” data management groups. The council should provide guidance and oversight for these groups to facilitate alignment to the data governance vision and objectives.
•Operational—Thisistheexecutionlevelofthegovernance model, composed of technology SMSs anddatastewardsalignedtosubjectareasofpractice/theirspecialization.Theseindividualsshoulddeterminethe standards and policies defined and developed are implemented and adhered to by the user community. They raise issues to the tactical council for consideration and resolution.
TRI approach should enforce necessary compliance to current data standards and be able to accommodate standardschangesovertime.Enforcementofstandardsand exchange protocols facilitates the reliable acquisition of information at the front end of the data management workflow and the effective publishing and dissemination of results at the end. The more the data management workflow allows for interoperability, the more it helps the organizationmeetthegoalofadoptionandtruesharingofinformation and knowledge within the community.
Source:DeloitteConsultingLLP
Data governance and ownership
MetabolomicsProteomics
Transcriptomics
Genomics
MetabolomicsProteomics
Transcriptomics
Example data stewards
BI delivery
Data architecture
Tools acquisition and delivery
DG leadership committee
Data governance council
Change management and compliance group
Outreach management group
Outreach management group
Infrastructure group
Data stewardship support group
Example data custodiansGenomics
Cross “omics” data lead steward group
Metadata management group
Strategic decision making
Tactical decision making
Operational decision making
The scientific information landscape and applications to Translational Research 15
Technology — Let it support your data management needsTo help keep abreast of the changing landscape of data domains and data types, A TRI solution technology strategy should consider the following criteria:•Bescalabletomanagelargedatavolumeswithhigh
performance•Applytomultipleenvironments(development,pilot,qualityconfirmation,andproduction),eachwithbigdata challenges
•Handlesemistructuredandunstructureddata•Integratesilosofinformationfromdisparatedatasets•Workwithahighlydistributedcomputingandstorage
environment•Allowsophisticatedanalysisandknowledgegeneration•Catertouserswithawiderangeofskills•Allowinformationsecurityasneeded
For the community at large, the big data challenge poses the follow-on challenges of data storage, access, analysis, andvisualization.ForTRIsolutions,atechnologybasemustprovide a means of addressing these challenges for now and the future. We address three focus areas in technology that will provide big wins for reaching this goal.
“Everybody has to be able to participate in a future that they want to live for. That’s what technology can do.”
Dean KamenPhysicist,entrepreneurandinventor,inaninterview withtheChiefGartnerFellow,DarylPlummer,2003
(http://www.gartner.com/research/fellows/asset_55323_1176.jsp)
Aservice-orientedarchitecture(SOA)—Asimplefuture-proof frameworkASOAisbasedonlooselycoupledserviceswithinterfacesthat are independent of the implementation. Services can be deployed and removed easily. Services can also be easilyintegratedacrossdissimilarplatforms.SOAstandardscan allow new applications to share a common model of development, maintenance, support, and staffing specialization.Servicesincludeapplications,toolsthat
access the data, and data management technologies designed to maneuver the data through the data management cycle. For TRI solutions, the advantage that SOAcanofferistheeaseofdelivering,maintaining,andenhancingdataanalytics,visualizationtools,andothersoftware solutions as Web services, and hence, cater to a broadcustomerbase.Figure14presentsanexampleofhowSOAcanbeappliedtosupportinformationaccessand collaboration.
16
Figure 14. A service-oriented architecture
Source:DeloitteConsultingLLP
Serv
ice-
orie
nted
appr
oach
End
user
s
Dat
a re
posi
tori
esD
ata
acqu
isit
ion
Dat
a so
urce
s
Dat
a m
anag
emen
t fr
amew
ork
External data sourcesExample collaboration programs
Solid tumor
Meningococcal vaccine
Duchenne Muscular Dystrophy
Immune-mediated inflammatory diseases
Parkinson and Alzheimer
HIV research
Mosquito-borne diseases Cystic Fibrosis research
Hepatitis C research
Other programs
Clinical trialdata
Publications
IP patent sources
NIH data
Controlledontologies
Data security
Controlledvocabularies
Data standards
Web services
Data ownership
Metadatamanagement
Data workflow
Data governance
Data validation
Extract, transform, and load (ETL)
Other capabilities
Genomics
Metabolomics
Expression
Genetics
Proteomics
Physiologicalmodels
Biology
Cell biology
Drug
Experimental data
Genetics
PharmacokineticsPharmacodynamics
Physiology
Pharmacology Toxi
colo
gy
Clinical research
Clin
ical
tri
alda
ta
Pati
ent
data
Developmentand post market
Private/publicworkspaces
Knowledgemanagement
and collaboration
Highperformancecomputing
Network andinfrastructure
Data miningand analysis
Visualizationtools
Website/portals Social media Information
exchange hub
Big
-bio
-dat
a fr
amew
ork
(for
evo
lvin
g da
ta n
eeds
in t
he f
utur
e)
Other external data sources
CommercialBio Pharma
US Govt.Agencies
Non Profit Academia Care Providers IndependentResearchers
The scientific information landscape and applications to Translational Research 17
Storageandhigh-performancecomputing—Howmuch is enough? And where?Regardless of an elaborate discussio n on whether or not parent datasets should be stored within the walls of a TRI solution, a more fundamental need is that of providing scalable, high-performance storage infrastructure to cater to the portal’s high-performance computing demands. Beyond meeting the capacity and performance needs of the computational activities, storage solutions must be scalable with no downtime, suited to the volume and complexity of information encountered, easily managed, and future-proofed to be easily updated to changing scenarios. Network attached storage devices have undergone some serious architectural transformations to accommodate these needs. Systematic and continuous assessment of the evolution of the needs and the character of data itself will determine which storage devices to use. With the advent of cloud technologies, high-performance computing(HPC)clustersandstorageinthecloudarenowrealities, therefore using cloud resources could provide both a scalable and cost-efficient option.
Technical support — Let it not be your Achilles heelReliableinfrastructureforinformationtechnology(IT)operations to support the TRI solution is crucial to the solution’s effectiveness and adoption. TRI solution strategies should consider effective practices and industry standards in hardware and software for networks, data storage, high-performance computing technologies, and service support mechanisms. Support should not be underestimated. The mandate of the scientific community is to do research, not to keep abreast of technologies and data platforms. Formal processes, such as the Information TechnologyInfrastructureLibrary(ITIL)/InformationTechnologyServiceManagement(ITSM),enablethecreation of these service models and facilitate their successful operation. Implementing scientific technical support and training teams to support the portal can reduce the burden on the TRI solution’s scientific users. Figure15outlinestheelementsofatypicalservicemodel.
Figure 15. Application management service overview
Source:DeloitteConsultingLLP
ASOAapproachforTRIsolutionscanfulfillseveraloverarching purposes:•Allowthecreationofaninformationhubthatprovides
users with access to publicly available datasets and allows data sharing and exchange in a flexible manner. Such an exchange hub provides the capabilities for simultaneous data acquisition in a standard, reliable, and traceable manner.
•Createtheabilitytoaddtoolsatwill.Thisenablesnotonly the ability to incorporate tools developed internally and other open sources, but also provides users a choice of analytical methods.
•Developanintuitiveinterfacetoworkwiththeirdataand collaborate with others.
TheSOAmethodthushelpsbuildaservicecommunity,where members may collaborate and take advantage of the large base of scientific knowledge and programming services that are available in the community and being constantlygenerated.ThereisnodoubtthattheSOAarchitecture, which enables the collaboration of both scientists and informatics personnel, could demonstrate eminently suitable for a TRI solution, providing a technology solution to make the most of the changing information landscape, and cater to the needs of users.
Facility management
Technology management
IT service management
TRI applicationmanagement
• ITIL service strategy and/or process improvement
• Asset management and optimization
• Operational stabilization• Application portfolio
rationalization and/or consolidation
• Demand and portfolio management
• Training and live support
• Infrastructure consolidation, virtualization
• Disaster recovery capabilities• Physical security• Modular data storage capabilities
• Standardized andsimpli�ed technology stacks and tools
• Cloud and/or software as a service
• Networking, platform interoperability
18
Figure 16. The scientific community will likely influence TRI solution capabilities
Product/service users want to define choices in a manner that reflects their view of value, and they want to interact and transact in their preferred style.
Source:DeloitteConsultingLLP
Collaboration — Enable a technical necessityWhile we have recommended a number of effective practices to support TRI solutions and advance tools and architecture to promote science, it is vitally important to keep the stakeholders and scientific community involved in key decisions related to technical and functional enhancements. The future of a sustained TRI solution is dependent on collaborative efforts to both influence enhancement of capabilities and to promote data sharing among the scientific community. If collaborative efforts are done correctly, they could pave the way for innovation and expeditetherealizationoftheproposedfuturestate.
Co-creating TRI solution capabilitiesIt has been borne out by our client experience and by industry practice that value is co-created with customers ifandwhencustomerscanpersonalizetheirexperienceusinganorganization’sproduct-serviceproposition(Figure16).Weobservethatproductvalueisbeingincreasinglyco-created by the capability provider and the customer. The evolution of TRI solution capabilities will require strong communication mechanisms and close coordination and prioritizationofthescientificcommunity’sneeds.SinceTRIsolutions encourages data sharing and collaboration, its adoption requires consideration of a paradigm shift for the research community.
To promote co-creation of TRI solutions, administrators should engage in outreach activities and give presentations at annual meetings to address issues about data standards, data submissions, analysis tools, and data sharing policies. Also, administrators should interact with researchers to collect requirements and feedback on advancing theproduct.However,thistraditional“activeprovider/passive consumer market construct” approach should be supplemented with leading practices that encourage active consumer interactions. The TRI solution funded program will need executive leadership that could leverage its relationship-building skills and sphere of influencetointeractandgalvanizeanexpandedpoolofresearch networks and institutions, collaboration partners, and other research funded federal agencies. The TRI solution funded program should interact with and seek input from each of these groups for adopting common industry standards, facilitating secure data exchange, and
mitigating concerns and issues related to patient privacy rights. Doing so will not only improve platform capabilities, but can also foster research collaboration among diverse scientific communities.Administrators should create both traditional and nontraditional outreach mechanisms with the scientific community to enhance TRI solution capabilities. Traditionally, outreach is a mechanism to provide messaging and training to the targeted community. Taking this a step further by leveraging this community to provide feedback and direction on future releases engages the community in building a highly valued, and viable solution. An example would be to leverage the community to provide insight on developing solutions foradoptionofindustrystandards(e.g.,CDISCandHL7)during the data submission process. Another example is to leverage the proposed SMS panel of biology research areas, bioinformatics, and clinical research practitioners to help create more effective avenues to interact with the end-user community. These interactions could occur, for instance, at annual scientific and TRI meetings.
Biology research
Cheminformatics
Bioinformatics
HealthoutcomesTRI solution
The scientific information landscape and applications to Translational Research 19
Advancing our scientific knowledgeA scientist’s normal collaboration network entails consuming public document records, presenting topics and findings at scientific conferences and consortia, and collaborating with research colleagues. These networks are invaluable and essential, because they not only enhance TRI solution capabilities, but also advance our understanding of the science and the mechanisms required to support collaboration. A TRI solution strategy should facilitate collaborations with an operating model in place
Figure 17. Meeting TRI solution challenges
“Collaborations become necessary whenever researchers wish to take their research programs in new directions.”
F. MacrinaDynamic issues in scientific integrity: collaborative research. A report to the American Academy of
Microbiology..1995.
Source:DeloitteConsultingLLP
designed to address technology, people, and process challenges(Figure17).
The recommended operating model, which is detailed in the next section, is designed to establish standards enabling interoperability and sharing of data across the scientific community using common vocabulary toolsetsandformats.Byharmonizingstandards,differentinformation systems and communities can “speak the same language” and work together technically to manage and use consistent and reliable scientific information.
Duchenne
Muscular
Dystrophy
Imm
une-
med
iate
d in
flam
mat
ory
dise
ases
Mosquito-
borne
diseases
Parkinson and Alzheimer
Meningococcal
vaccine
Cystic F
ibrosis
Clin
ical
tr
ail d
ata
Publication
Hep-C
HIV
Usability
Ince
ntive
s
Dat
a ex
ch.
prot
ocol
s
Technology
standards
Dataquality DatastandardizationSemantic
sM
etad
ata
Data
owners
Workflows
Datasecurity
Technology People
Process
Engage all stakeholders equally; identify mutual benefits, and communicate mutual interests
Implement national data sharing and privacy standards
Enact change management policies across research network
Create national consortium to collect and share data, to develop standards,
and encourage partnerships
Increase adoption by partnering earlier to identify new research
opportunities
Identify long-term, cross-functional sponsors to make strategic decisions,
drive adoption, and communicate vision
Develop data networks to integrate research and patient data, genotypes, and phenotypes
Leverage alternate media networks
Develop data standards and information sharing platforms
to share and use valuablepatient assets
Develop data quality metric
Build and/or enhance information management structures based
on semantic interoperability and metadata structures
Design process to consistently capture and utilize data
Develop methods to fill information gaps
20
The TRI solution operating model will drive alignment of the data interoperability committees with lead scientific, informatics, and key SMSs from the panel to enhance existing standards, policies, and procedures. As the interoperability of data is enhanced, further adoption of new user communities can extend the reach beyond current implementations and capabilities, as well as the scope and usefulness of the data.
The outreach program can facilitate TRI awareness and adoption. As shown below in the commitment curve(Figure18),movingstakeholdersfromastateofawareness to a state of ownership will require an effective communication strategy and plan.
The objective of communication is to proactively address potentialrisks,preparepeopleforchange,andminimizedisruptions to their business activities. Additionally, it provides an avenue to receive feedback on the information productdelivered.Communicationforumsandformatssuch as newsletters, collaborative research blogs, and social media should be used to increase adoption among
thescientificcommunity.ClarifyingtheTRIsolutionbenefits and capabilities through enhanced communication and training protocols, as well as branding, marketing, and consensus-building activities can increase adoption. Leveragingchangemanagementtechniqueswillsupportthe communication strategy and outreach approach.
Tying it together — An intuitive collaborative portalSetting a strategy to develop the TRI solution as a key entry point for users to access data sources and analytical methods as a community portal to provide a truly collaborative space where researchers learn about relevant topics and exchange research ideas with the community. CreatingasocialmediainterfacewithintheTRIsolutioncan offer the portal the chance to cater to its community in a more direct manner. Social media technologies are promoting these collaborations for corporations and individuals, fostering collaboration among the diverse communityofresearchers.Figure19showshowaportalimplemented by Deloitte brought together search, analysis, andvisualizationtoolsacrossdatasetsbelongingtopharmacists, payers, providers, and clinical trials.
Individuals are aware of benefits, basic
scope, and concepts of program
Resources make the program their own
and create innovative ways to use and
improve
Individuals understand how the
program impacts them and their Job
This is the way work is done — the new status
quo
Individuals understand and are willing to acquire skills required to
adopt the program
Individuals have heard about the
program
Commitment curve
High engagement
Awareness Generalunderstanding
Personalunderstanding
Willing to accept
Buy-in
Ownership
Low engagement
Figure 18. Driving to greater stakeholder engagement
As used in this document, “Deloitte” means Deloitte ConsultingLLP,asubsidiaryofDeloitteLLP.Pleaseseewww.deloitte.com/us/aboutforadetailed description of the legal structureofDeloitteLLPanditssubsidiaries.Certainservicesmaynot be available to attest clients under the rules and regulations of public accounting.Source:DeloitteConsultingLLP
The scientific information landscape and applications to Translational Research 21
Figure 19. Hi2 — A portal for exchange and analysis of life science and healthcare data
Dashboard Analytics Search Customstudies My workspaceContract
management
Navigation and login settings
Heat map controls and previewer
Custom study status
Interchangeable, user-configurable Web
part areas
Dashboard Analytics Search Customstudies My workspaceContract
management
Selected heat map cell (zoom, show details) Filter, pivot, and data
analysis user controls
Export to graphicsor PDF, shareanalysis, etc.
Dashboard Analytics Search Customstudies My workspaceContract
management
Navbar search on every page
Purchase status indicator
Google-like search and advanced filteringGoogle-like search result display
with highlighted search terms
Analytics and report table of contents
previewer
Dashboard Analytics Search Customstudies My workspaceContract
management
Contact details populated from user
profile
User-suppliedparameters (auto-
populated from analytics, no
input allowed)
Today, many healthcare providers, life sciences organizations,andhealthplansarefacingunprecedented technological and regulatory changes. A perfect storm of increased regulation, demand for lower cost, expanding big-data challenges, along with the need for innovation, requires a different approach for improving analysis and gaining better insight of the available information to make effective and timely decisions. The key to providing decision-makers the ability to act on insights obtained through the analysis of structured and unstructured data is required to achieve significant improvements in care delivery at the patient and population level.
Deloitte has made a significant investment in health reform and analytics. A portion of this investment is devoted to the development of a subscription-based “Insights as a Service” capability we call DeloitteHealthInformaticsandInsights(Hi2)asrepresentedinFigure19.
Hi2wasdevelopedtoworkdirectlywithlargehealth systems to deliver commercial subscription-based solutions, enabling collaboration to effectivelyaddressmarketneeds.TheHi2approachto addressing the market needs involves deploying local analytics solutions behind health system collaborators’ firewalls, protecting patient data, and enhancing the valuable assets health systems have worked so hard to create.
Source:DeloitteConsultingLLP
22
Culture change should accompany capability developmentWith the availability of integrated bioinformatics resources, the scientific community will need to move away from being insulated and begin traversing disciplinary boundaries to better understand, treat, and prevent human diseases. The bioinformatics resource hub owners would have to focus on changing the existing culture within the scientific community, which prides itself on secrecy and competition amongst scientific investigators. Convincingthiswiderscientificcommunitytocreateanduse a common interdisciplinary bioinformatics resource can be a hurdle that should be overcome for such a program to be effective. Implementing a TRI solution to support translational research efforts across research fields and areas will require strategy, planning, technology enhancements, and cultural change management.
Leadership commitment — Managing cultural changeCommunicationandculturalchangemanagementmethodologies is the key to managing, implementing, and supporting technology programs designed to enable
TRIsolutions.Inourexperience,successfulorganizationsunderstand that culture must be actively managed via a deliberateprocessandusemethodologiestoorganizethecultural change process. We have found that, unlike typical change management, cultural change management is protracted, and requires alignment with leadership to drive desired culture and define change strategy. As shown in Figure20,theculturalchangecouldbeachievedthroughtailoring the transition approach and moving different types of stakeholders through cultural and behavior dimensions. Staying the course is critical to culture change.
Ourresearch,derivedfromourclientexperience,indicatesthat inadequate leadership sponsorship accounts for a major reason why large transformation projects fail. Effectivetransformations,therefore,havesolidsupportfrom the executive leadership team. In each effective business case, the cultural shift was treated as an importantorganizationalpriority,withcleargovernance,policies, and people in place to facilitate the change. The change management driven by the TRI solution program operating model should be designed to use a set of people-oriented strategies, tools, and techniques that, whenappliedatthescientificgroupandorganizationallevels, will determine that key scientific users are ready, able, and willing to accept and implement changes to how theystoreandanalyzeresearchdata.Itisanticipatedthatthe scientific community will then perceive it as an ongoing requirement to achieve future success in their research.
Implementing the vision for Translational Research
“…major culture change does not happen easily or quickly … even with excellent leadership at the top, major change requires many initiatives from many people, and that simply requires time, often lots of it.”
John KotterKonosukeMatsushitaProfessorofLeadership,
EmeritusattheHarvardBusinessSchoolLeading Change,HarvardBusinessReviewPress,1996
Figure 20. The path to culture change
Source:DeloitteConsultingLLP
Current culture
Ideal researchculture
Uncertainty CommittmentBehavior dimension
Cultu
re d
imen
sion
End users
Managers
Leadership team
Goal
Use of TRI solution as aninterdisciplinarybioinformatics resource
The path includes aligningleaders behaviors with desired culture, de�ningand communicating strategy
The scientific information landscape and applications to Translational Research 23
User adoption — A key to culture changeWe envision three equally important avenues that the TRI solution program leadership should consider pursuingconcurrentlytoimproveadoption(Figure21).These include improving the user’s experience while leveragingtheTRIsolutioncapabilities(“userexperience”),establishing data standards to improve data interoperability and managing it effectively to store and retrieve it for analysis(“datalifecyclemanagement”),andmarketingTRI solution as the authority in translational research and relateddatawhileprovidingincentivesforusage(“userincentives”).
User experience: The TRI solution experience will eventually decide whether users will be back on the portal. To drive adoption, the TRI solution portal should be creative, yet scientific, with the right navigational components in place. It should be exciting and fun to use. The scientific
workbench and portal proposed earlier allows an intuitive encounter with a variety of tools promoting user experience and appeal. By thinking through innovative methods and conducting ongoing user simulations, administrators could improve the user experience and keep the user “addicted” to the TRI solution portal. The other aspect of improving the user experience would include proactively maintaining and enhancing user-aligned platform capabilities, such as increasing the number of toolsets and resident datasets and designing usable workflows. A framework, such as the one proposed for the TRI solution portal, could be used to drive enhancements with user input and preferences.
User incentives: TRI solution needs to offer users incentives to come back to the portal, and to do this, it has to go beyond the more traditional research portals, and become the go-to research platform for the scientific community. Some features that will make the TRI solution portal attractive to users are:
•Asanauthoritativesourcefortranslationalandrelatedresearch data: By housing or providing access to an exhaustive set of research data that offer insights in research topics, by providing reference frameworks, such as controlled vocabularies and ontologies, and by creating research-specific data management tools, the TRI solution portal can become the authoritative source for research data
•Asacatalystforcollaboration:ThesuccessofCytoscape,an NIH initiative that attracts industry collaboration and investment, is well known. With careful implementation, a TRI solution portal solution has the chance to repeat this success
•Asapublishingplatform:Withthewealthofdatasetsand analytical tools and a diverse collaboration community, the TRI solution portal, can position itself to support and publish, large-scale scientific studies
Datainteroperability:Definingandharmonizingstandardsthroughout the data life cycle is essential for the operation of a scientific community. It can allow different information systems and communities to “speak the same language” and work together technically to manage and use consistent, correct, and useful scientific information.
Figure 21. Three-pronged approach to engage adoption of TRI solution by the scientific community
Source:DeloitteConsultingLLP
User experience
User incentives
Data interoperability/
data life cycle management
TRIsolution
adoption
24
Program management — An ongoing processGiven that TRI solution program would need to overcome numerous challenges to achieve future-state readiness, managing the steps for implementing such a platform requires a multidisciplinary program management structure that can provide technical and scientific knowledge to enhance the bioinformatics activities within the scientific community. As noted above, our experience and the program requirements suggest that administrators of the program should establish an operating model governance structure to move implementation through the maturity modelframework.Overcomingchallengeswillrequireduediligence with current and potential partners and changing the existing culture within the scientific community, which prides itself on secrecy and competition amongst scientific investigators, to one that is more open, and enables sharing and collaboration. A well-defined program structure,asshowninFigure22,couldserveasthetransformational operating model for implementing such a program and improving capabilities to integrate data. It also could improve interoperability among disparate systems that can be used by the scientific community to generate further innovations.
Theorganizationmodelisillustrativeofhoweffectiveorganizationshavetypicallystructuredthemselvestobringtogether the necessary governance and management to facilitate planning, coordination, and implementation of transition-related activities. As noted earlier and throughout this paper, it is evident that efficient therapies would be based on combinatorial analysis of often-disparate data displaying stage-of-disease progression, as well as a patient’s medical and individual characteristics. Therefore, it is crucial that the operating model is represented by skilled practitioners that can influence decision making and have leadership capabilities to drive change at the necessary levels of the research community. Thus,atthetopofthisoperatingmodelistheConsortiumLeadershipBoard.Itwouldbemadeupofprincipalinvestigators, research area leads, TRI leadership, and an SMS panel.
TheConsortiumLeadershipBoardwouldhelptoaddressIPrightsanddataprivacyandsecurityissues,anditwould work with TRI leadership and the TRI solution technical team to understand program progress and to drive visibility into significant program decisions, risks, and opportunities — effectively bringing the larger power and collective experience of consortium representatives to bear. Data management and enhancement activities
Figure 22. A consortium-based TRI solution operating model
Source:DeloitteConsultingLLP
Consortium leadership
Principle investigators Research area leads TRI leadership Cross-scientific SMSsResearch analytics SMSs
Scope of responsibilities
IP rights Incentives Semantics Data protocols Security Alignment on standards, capabilities, and technologies
Dynamics of contribution
Repository data management
Project-based workgroups with clear roles and responsibilities
Supporting platforms and/or technologies
Collaboration andoutreach
Quality controland documentation
Analytics and visualizationtools
System and infrastructure
Projectmanagement
Training andassistance
The scientific information landscape and applications to Translational Research 25
would be managed by project-based work streams. The SMS panel would be represented byindustryspecialistsfromdifferentdisciplinaryfields.Suchapanel(Figure23)isneededtobringforthfreshperspectivesfromtheirrespectivedomainsofspecializationtoresolveissues that may impede data interoperability and collaboration.
The project management leadership team would maintain direct communication with their workstream team members. It would directly discuss day-to-day operational matters and provide feedback without having to go through an extra communication layer. It would bring the necessary governance and management framework to facilitate that planning, coordination, and implementation of activities are carried out in a coordinated manner.
Figure 23. The SMS panel provides fresh perspectives
Source:DeloitteConsultingLLP
SMSs
Ontologies lens
Health outcomes lens
Bioinformaticslens
Cheminformatics lens
Translational research lens
26
It is not easy to prepare for what the future holds. No one has a crystal ball, and no strategy is perfect. The journey ahead will demand smart investments, flexibility, agility, and change management.
Across collaboration, data management, and technology focus areas, this whitepaper highlights leading practices to facilitate implemetation of a TRI solution, In the relevant sections of this whitepaper, the following hasbeenoutlinedforconsideration:(i)developinganoperating model to support and maintain a TRI solution; (ii)importanceofintegratedanalysistoolsandalgorithms;(iii)providingsystemarchitecturestrategy;(iv)integratingvariouscontentdatasourcesforend-to-endanalytics;(v)enriching the usefulness of content through ontologies andstandardvocabularies;and(vi)improvingdatainteroperability.AsoutlinedinFigure24,makingchangesto these focus areas can be accomplished in a phased manner designed to facilitate implementation of a TRI solution as an accelerator of collaboration in the big-data environment. From a technical perspective, this provides a structured path to implement such a platform.
The strategy can be implemented via a systematic three-stepapproach,describedinFigure24.Inthefirststep, the team should consider working collaboratively with members of the scientific and biomedical research communities to fully understand the current state of TRI and gain alignment within the consortium leadership board, SMS panel, and select user community members on its restructuring needs. The second step involves creating a clear implementation roadmap outlining the required activitiestorealizeidentifiedopportunities.Thelaststepis
execution of the roadmap, including the communication andculturalchangemanagementstrategies.Conductinga detailed assessment of needs and targeted planning approach will support implementation of such a large and transformational solution.
The data and technology landscape are very dynamic and so are the forces that shape the biology world. The planning exercise should be conducted by a visionary team that has the scientific, technical, and operational knowledge in the areas of scientific research, bioinformatics, data management, and IT infrastructure. The skills and experience of this diverse team will enable theeffectiveimplementationofaTRIsolutiontorealizelong-term strategic vision and offer an innovative engine for the scientific research. No plan can be set in stone — the TRI solution implementation program will need ongoing evaluation and course correction to align its plans with the trends in the healthcare ecosystem and help achieve the future state vision. Nevertheless, the recommended investments in implementing such a platform will speak for themselves — the return on investment could be self-evident for stakeholders involved, attracting investments from external collaborations and powering the TRI solution to be a truly collaborative solution for the “big-bio-data” landscape of the future. Is this utopia? Yes. But it can be within reach.
Conclusion
Assess currentstate
Define roadmap for implementing improvement
opportunities
Implement improvement opportunities
Identify restructuring needs and improvement opportunities around three pillars — bio-data management, technology, and collaboration
Create a clear roadmap for implementing identified opportunities with tactical, programs, and projectsoutlined
Implement roadmap:• Develop collaboration
operating model• Leverage scientific leading
practices, redesign architecture, data governance, etc.
• Redesign technology choices and direction
Figure 24. A work plan for implementing a TRI solution
Source:DeloitteConsultingLLP
The scientific information landscape and applications to Translational Research 27
AuthorsTunc TokerSenior [email protected]
LukeDunlapSenior [email protected]
AsifDhar,MD,[email protected]
AcknowledgmentsThe authors wish to thank the following collaborators for their contributions to this paper:
RobertDecker,ThomasBreuer,RichCohen,NitinMittal,KumarNagarajan,DavidCroft,NinaTatyanina,andChristopherComrack.
SanjaySrivastava,[email protected]
ParagAggarwal,PhDSpecialist [email protected]
LeeAnnBailey,PhDSpecialist [email protected]
SanthaRamakrishnan,PhDSeniorConsultantDeloitteConsultingLLPsantramakrishnan@deloitte.com
SurenDheenadayalanManagerDeloitteConsultingLLPsdheenadayalan@deloitte.com
YvettePalmerSpecialist [email protected]
Acknowledgments
AboutDeloitteDeloittereferstooneormoreofDeloitteToucheTohmatsuLimited,aUKprivatecompanylimitedbyguarantee,anditsnetworkofmemberfirms,eachofwhichisalegallyseparateandindependententity.Pleaseseewww.deloitte.com/aboutforadetaileddescriptionofthelegalstructureofDeloitteToucheTohmatsuLimitedanditsmemberfirms.Pleaseseewww.deloitte.com/us/aboutforadetaileddescriptionofthelegalstructureofDeloitteLLPanditssubsidiaries.Certainservicesmaynotbeavailabletoattestclientsundertherulesandregulationsofpublicaccounting.
Copyright©2012DeloitteDevelopmentLLC.Allrightsreserved. MemberofDeloitteToucheTohmatsuLimited