genedata profiler & irods an open & collaborative

18
Genedata Profiler & iRODS An Open & Collaborative Enterprise Software Platform for Patient and Compound Profiling Marc Flesch, Tamas Rujan

Upload: others

Post on 19-Apr-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Genedata Profiler & iRODS An Open & Collaborative

Genedata Profiler & iRODSAn Open & Collaborative Enterprise Software Platform for Patient and Compound Profiling

Marc Flesch, Tamas Rujan

Page 2: Genedata Profiler & iRODS An Open & Collaborative

© 2015 Genedata 2Confidential and Proprietary

Genedata – Corporate Snapshot

RootsEstablished in 1997 | Privately owned | Headquartered in Switzerland

Global Reach~ 200 employees | Offices in Europe (Basel, Munich), North America (Boston, San Francisco) & Asia (Tokyo)

Dedicated to Drug Discovery & BiotechnologyInnovative portfolio of enterprise systems increasing productivity of data rich & complex research processes

Domain ExpertiseExperienced Ph.D. level experts coupled with efficient software engineering processes

Marquee Customer BaseLeading pharmaceutical, biotechnology, and other life science organizations

Page 3: Genedata Profiler & iRODS An Open & Collaborative

© 2015 Genedata 3Confidential and Proprietary

Customer Base – Pharma

San Francisco

Munich Basel

Tokyo

Boston

25 of Top 25 Pharmasand more …

Page 4: Genedata Profiler & iRODS An Open & Collaborative

© 2015 Genedata 4Confidential and Proprietary

Supporting the Patient Profiling Process

Patient cohorts NGS

Responder

Non-responder

Patient stratificationDrug response prediction

ATCTCTTGGCTCCATCATTTAGAGGAAGGAACTGTCAAAACTTGTTGCTTCGGCGGGGCCTGCCGTGGCATCTCTTGGCTCCAGCAGCATCGATGAATCGATACTTCTGAGTCGGATCTCTTGGCTACAACGGATCTCTTCGGATCTCTTGGCTGATGAAGAACGCAG

Page 5: Genedata Profiler & iRODS An Open & Collaborative

© 2015 Genedata 5Confidential and Proprietary

Major Challenges of Patient Profiling Process

• Efficiently managing, processing, and analyzing data– Huge & complex datasets containing patient related omics data

– Integrating disease & genomic information from different studies

• Facilitating collaboration within interdisciplinary teams– Enabling easy data, method & result sharing

– Global distribution of data generators & data consumers

• Working with data from human samples in research environments– Ensuring privacy of patient information

– Maintaining chain of custody

Page 6: Genedata Profiler & iRODS An Open & Collaborative

6© 2015 Genedata Confidential and Proprietary

“Using data from clinical samples is challenging, because we need to take patient privacy very seriously” *Henrik Seidel, Bayer

Problem Statement

Page 7: Genedata Profiler & iRODS An Open & Collaborative

© 2015 Genedata 7Confidential and Proprietary

Data privacy within a global Organization

Illumina SequencerHPC Cluster

… how-to efficiently work with distributed data?

Illumina Sequencer HPC Cluster

User GroupUser Group

User Group

Page 8: Genedata Profiler & iRODS An Open & Collaborative

© 2015 Genedata 8Confidential and Proprietary

At Present…

Common technologies applied include

• UNIX file permissions• POSIX Access Control Lists (ACLs)• CIFS Shares (SAMBA)

With the following shortcomings

• UNIX permissions are too simple to model project centric access patterns

• paths on UNIX file systems can’t replace data management systems • permissions have to be maintained manually which is extremely

cumbersome• ACLs are hard to manage• distributed storage problem stays unresolved

Page 9: Genedata Profiler & iRODS An Open & Collaborative

Our Solution

Page 10: Genedata Profiler & iRODS An Open & Collaborative

© 2015 Genedata 10Confidential and Proprietary

Marrying Security with Performance

HPC

InputData

CacheCopy

TempResults

ResultData

ComputeCluster

Page 11: Genedata Profiler & iRODS An Open & Collaborative

© 2015 Genedata 11Confidential and Proprietary

RNA-Seq Data-Processing Pipeline

Page 12: Genedata Profiler & iRODS An Open & Collaborative

© 2015 Genedata 12Confidential and Proprietary

and Interaction Points with

Page 13: Genedata Profiler & iRODS An Open & Collaborative

© 2015 Genedata 13Confidential and Proprietary

Profiler

Chain-of-Custody

rna1_1.fq

rna1_2.fq

rna2_1.fq

rna2_2.fq

rna3_1.fq

rna3_2.fq

rna4_1.fq

rna4_2.fqTina

Alice

Bob

Joe

Tina

Joe

Bob

sequencealignment

RNAquantifi-

cation

dataexport

Alice

Page 14: Genedata Profiler & iRODS An Open & Collaborative

© 2015 Genedata 14Confidential and Proprietary

Enabling Intuitive Raw Data Management

1. Visualization of clinical sample annotation together with corresponding raw data

2. Flexible search functionalities across the whole database

3. Powerful annotation curation capabilities including bulk editing and annotation information protection

Page 15: Genedata Profiler & iRODS An Open & Collaborative

© 2015 Genedata 15Confidential and Proprietary

Marrying Raw Data with Sample Annotation

Sample AnnotationRaw Data

Page 16: Genedata Profiler & iRODS An Open & Collaborative

© 2015 Genedata 16Confidential and Proprietary

Providing ‘Google-Like’ Search

search result

complex search

Page 17: Genedata Profiler & iRODS An Open & Collaborative

© 2015 Genedata 17Confidential and Proprietary

Sample Annotation Curation

locked downattribute

multiple valuesincluding units

browse sequence

Page 18: Genedata Profiler & iRODS An Open & Collaborative

© 2015 Genedata 18Confidential and Proprietary

Summary

• The smooth integration of Genedata Profiler with iRODS enables scientists to preserve their research eco-system when working with confidential data

• Genedata Profiler’s data processing and management capabilities together with iRODS’ metadata and security concepts are a unique combination to establish the chain-of-custody for analyzing personalized medicine data