science-driven informatics for pcori pprn
DESCRIPTION
Science-driven informatics for pcori pprn. Kristen Anton UNC Chapel Hill/ White River Computing Dan Crichton White River Computing February 3, 2014. From Information Design , Nathan Shedroff. White River Computing / UNC Chapel Hill. Architecture – what is it?. Architecture:. - PowerPoint PPT PresentationTRANSCRIPT
SCIENCE-DRIVEN INFORMATICSFOR PCORI PPRN
Kristen AntonUNC Chapel Hill/ White River Computing
Dan CrichtonWhite River Computing
February 3, 2014
White River Computing / UNC Chapel HillFrom Information Design, Nathan Shedroff
• Process Architecture – describes the core processes for the system• Data Architecture – describes the information models and data
standards for the system• Application Architecture – Portals, tools, etc.• Technology Architecture – Infrastructure elements
White River Computing / UNC Chapel Hill
Architecture – what is it?
• The fundamental organization of a system embodied in its components, their relationships to each other and to the environment, and the principles guiding its design and evolution (ANSI/IEEE Std. 1471-2000)
Architecture:
Architecture is decomposed into four core pieces :
• Identify the drivers and requirements• Create an architectural description of the system –
identified stakeholders, concerns and associated models• Identify core architectural principals• Separate the architecture into key viewpoints• Create a decomposition of the system identifying the
elements and mapping to the requirements• Identified the high-level flows and analyze from the
rpocess, information and application/technology perspectives
• Generate the architectural models
White River Computing / UNC Chapel Hill
Architecture Development Approach
• One of the major challenges is communicating an architecture
• Who are the PCORI stakeholders that care about the architecture?
• How do we communicate their care-abouts?
White River Computing / UNC Chapel Hill
Communicating an Architecture
• Determine a useful view of the system for the stakeholder
• Projects have suffered because a useful view wasn’t provided
The viewpoint is where you look from
The view is what you see
(Stakeholders)
• The organization, implementation and deployment of the software should follow the identification of an architecture which aligns with the principles and needs of the stakeholders
• The separation of the architecture into concerns will let us determine what capabilities exist and what capabilities need to be developed
• Ultimately this will help to ensure that a system is deployed which will integrate
White River Computing / UNC Chapel Hill
Software Development
White River Computing / UNC Chapel Hill
Recommended Software Development Approach
Project Formulation
SystemFormulation/Architecture
Site Development
Project Organization, Objectives, High Level Schedule and Project Plan
High-Level Architecture for System and Data, Architecture, Data Flows, Initial Data Structure, etc
Development and deployment of theinfrastructure and architecture; development of the core data model/ consistent with PCORnet “universal” data model?
Jan 2014 – Mar 2014
Feb 2014 – June 2014
June 2014 – June 2015
White River Computing / UNC Chapel Hill
Supporting science-driven research needs:Case Study – Early Detection Research Network
(EDRN)• Research network of collaborating scientists from
more than 40 institutions – international network of networks
• Focus on identifying and validating biomarkers of cancer at early stage/ preclinical
Bioinformatics challenges in EDRN:Developing computing infrastructure that is “biomarker-centric.”
Improve research capability by enabling real-time access to a variety of information that crosses institutional boundaries.
• Coordinated discovery and validation of biomarkers across cancer research centers to increase accuracy of the results of studies
• Accommodating various data types• Facilitation of analytics through data integration
and single-point access• Support workflows associated with various types
of information• Encouraging and supporting collaboration
White River Computing / UNC Chapel Hill
Bioinformatics – GoalsSupporting science-driven research needs
White River Computing / UNC Chapel Hill
Bioinformatics – GoalsSupporting science-driven research needs
• Linking highly diverse systems together to integrate and present data for analytics
• Defining a comprehensive information model for describing the problem space/ ontology
• Providing software interfaces for capture, discovery, and access of data resources
• Providing a secure transfer and distribution infrastructure• Enabling all data sources to be heterogeneous and
distributed• Providing integrated portal for access to distributed data• Providing bioinformatics tools/ pipelines for uniform data
processing
White River Computing / UNC Chapel Hill
BioinformaticsEDRN Knowledge Environment
Functional architecture: Services
• Data capture• Data discovery• Data access• Data retrieval• Data processing• Data distribution
• Biomarkers• Studies• Participants• Organs• Data generated from instruments (e.g. mass
spec, arrays)
White River Computing / UNC Chapel Hill
BioinformaticsEDRN Knowledge Environment
Information architecture: Data Model across EDRN projects (“universal” data model)
• Representation of information associated with data objects managed within the knowledge system
• Models for:
• Relationships between and among objects• Standard set of metadata elements that can be used for
annotating objects• Multiple metadata schemata for machine usable
explanations of the metadata descriptions• Metadata descriptions describe the inception and
composition of data• Common language for describing data and associated
attributes: Common Data Elements (CDEs)• CDE has a Uniform Resource Identifier (URI) – URL form
points to CDE definition page – used in XML standards
White River Computing / UNC Chapel Hill
BioinformaticsEDRN Knowledge Environment
Information architecture: Data Model
eCASScience Warehouse
CDE Repository
ERNE
VSIMS
Participant DB
Protocol DB
Public Portal
Distributed SpecimenDatabases
EDRN science data results (local, distributed and varying
degrees of validation)
Descriptions of biomarkersand their use (protocol_id)
Descriptions of EDRN studies-Participants-Specimen tracking, etc
Protocols and theirdescriptions
Data elements and their descriptions
BIOINFORMATICSTOOLS
EDRN science data results(protocol_id,
participant_id)(protocol_id,
participant_id)
(protocol_id,participant_id)
Biomarker_DB(protocol_id)
Participants and their
characteristics
EDRN Knowledge Environment
• Biomarker Database holds 850 curated biomarkers, including panels/ signatures of biomarkers
• Biomarker Database modeled to reflect the data model: activity in multiple organs, protocols, data files – facilitate single-point data access
• eSIS contains 165 protocols• eCAS holds 56 data sets, with many files in each set, and more added
daily – standard metadata around each set and each product• Two bioinformatics tools implemented: Proteomics “pipeline”
(generating standardized biomarker identification files); REDCap (standardized data definition and capture at the project level) – additional in progress
• Common Data Elements (CDEs) contributed to the NCI repository• CDE has a Uniform Resource Identifier (URI) – URL form points to
CDE definition page – used in XML standards• Portal facilitates authorized access to almost 200,000 specimens• Publications and Resources
White River Computing / UNC Chapel Hill
EDRN Knowledge EnvironmentSuccess?
White River Computing / UNC Chapel Hill
EDRN Knowledge EnvironmentTechnology
• Iterative development• Open Source philosophy and tools• Apache OODT (Object Oriented Data
Technology)
Software components developed independent of any data model:
EDRN’s computing infrastructure can be replicated
White River Computing / UNC Chapel Hill
EDRN Knowledge EnvironmentTechnology
White River Computing / UNC Chapel Hill
Bioinformatics – GoalsSupporting science-driven research needs: SHARE
Geisel School of Medicine at Dartmouth / UNC Chapel Hill
Bioinformatics – GoalsSupporting science-driven research needs: SHARE
Geisel School of Medicine at Dartmouth / UNC Chapel Hill
Supporting science-driven research needs: PCORI PPRN
Geisel School of Medicine at Dartmouth / UNC Chapel Hill
Opportunity to offer our architecture to PCORnet?
Synergy in data modelQuery across CCFA PPRN network …network of networks?