big data for personalized health & genomicsfiles.meetup.com/19290665/20160719...

23
1 © Cloudera, Inc. All rights reserved. Big Data for Personalized Health & Genomics Shawn Dolley| Industry Leader, Health & Life Science Big Data Healthcare Meetup July 2016

Upload: others

Post on 24-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Big Data for Personalized Health & Genomicsfiles.meetup.com/19290665/20160719 Personalized... · © Cloudera, Inc. All rights reserved. 9 2006 2008 2009 2010 2011 2012 2013 Core Hadoop

1 © Cloudera, Inc. All rights reserved.

Big Data for Personalized Health & Genomics

Shawn Dolley| Industry Leader, Health & Life Science

Big Data Healthcare Meetup July 2016

Page 2: Big Data for Personalized Health & Genomicsfiles.meetup.com/19290665/20160719 Personalized... · © Cloudera, Inc. All rights reserved. 9 2006 2008 2009 2010 2011 2012 2013 Core Hadoop

2 © Cloudera, Inc. All rights reserved.

http://www.nih.gov/precisionmedicine/infographic-printable.pdf

Page 3: Big Data for Personalized Health & Genomicsfiles.meetup.com/19290665/20160719 Personalized... · © Cloudera, Inc. All rights reserved. 9 2006 2008 2009 2010 2011 2012 2013 Core Hadoop

3 © Cloudera, Inc. All rights reserved. Public domain slide, by Brian Wells, Penn Medicine, http://tinyurl.com/zsayqld

Page 4: Big Data for Personalized Health & Genomicsfiles.meetup.com/19290665/20160719 Personalized... · © Cloudera, Inc. All rights reserved. 9 2006 2008 2009 2010 2011 2012 2013 Core Hadoop

4 © Cloudera, Inc. All rights reserved.

Precision Medicine…….. Why Now?

Page 5: Big Data for Personalized Health & Genomicsfiles.meetup.com/19290665/20160719 Personalized... · © Cloudera, Inc. All rights reserved. 9 2006 2008 2009 2010 2011 2012 2013 Core Hadoop

5 © Cloudera, Inc. All rights reserved.

Collecting a patient genome is affordable

“In 10 years we’ve come from a $300M genome to one that’s realistically available at around $3000. That’s a 100,000 fold drop!” - James Hadfield, Next Generation Sequencing, 2014

Source: Nature, 2014

Page 6: Big Data for Personalized Health & Genomicsfiles.meetup.com/19290665/20160719 Personalized... · © Cloudera, Inc. All rights reserved. 9 2006 2008 2009 2010 2011 2012 2013 Core Hadoop

6 © Cloudera, Inc. All rights reserved.

Patient data moves from paper to digital

Page 7: Big Data for Personalized Health & Genomicsfiles.meetup.com/19290665/20160719 Personalized... · © Cloudera, Inc. All rights reserved. 9 2006 2008 2009 2010 2011 2012 2013 Core Hadoop

7 © Cloudera, Inc. All rights reserved.

HIPAA enabled us to break down industry silos

Page 8: Big Data for Personalized Health & Genomicsfiles.meetup.com/19290665/20160719 Personalized... · © Cloudera, Inc. All rights reserved. 9 2006 2008 2009 2010 2011 2012 2013 Core Hadoop

8 © Cloudera, Inc. All rights reserved.

A new class of data scientists trained for a decade

Page 9: Big Data for Personalized Health & Genomicsfiles.meetup.com/19290665/20160719 Personalized... · © Cloudera, Inc. All rights reserved. 9 2006 2008 2009 2010 2011 2012 2013 Core Hadoop

9 © Cloudera, Inc. All rights reserved.

2006 2008 2009 2010 2011 2012 2013

Core Hadoop (HDFS,

MapReduce)

HBase ZooKeeper

Solr Pig

Core Hadoop

Hive Mahout HBase

ZooKeeper Solr Pig

Core Hadoop

Sqoop Avro Hive

Mahout HBase

ZooKeeper Solr Pig

Core Hadoop

Flume Bigtop Oozie

HCatalog Hue

Sqoop Avro Hive

Mahout HBase

ZooKeeper Solr Pig

YARN Core Hadoop

Spark Tez

Impala Kafka Drill

Flume Bigtop Oozie

HCatalog Hue

Sqoop Avro Hive

Mahout HBase

ZooKeeper Solr Pig

YARN Core Hadoop

Parquet Sentry Spark

Tez Impala Kafka Drill

Flume Bigtop Oozie

HCatalog Hue

Sqoop Avro Hive

Mahout HBase

ZooKeeper Solr Pig

YARN Core Hadoop

2007

Solr Pig

Core Hadoop

Knox Flink

Parquet Sentry Spark

Tez Impala Kafka Drill

Flume Bigtop Oozie

HCatalog Hue

Sqoop Avro Hive

Mahout HBase

ZooKeeper Solr Pig

YARN Core Hadoop

2014 2015

Kudu

RecordService Ibis

Falcon Knox Flink

Parquet Sentry Spark

Tez Impala Kafka Drill

Flume Bigtop Oozie

HCatalog Hue

Sqoop Avro Hive

Mahout HBase

ZooKeeper Solr Pig

YARN Core Hadoop

Big Data finally mature and ready

Page 10: Big Data for Personalized Health & Genomicsfiles.meetup.com/19290665/20160719 Personalized... · © Cloudera, Inc. All rights reserved. 9 2006 2008 2009 2010 2011 2012 2013 Core Hadoop

10 © Cloudera, Inc. All rights reserved.

How big is the data?

Page 11: Big Data for Personalized Health & Genomicsfiles.meetup.com/19290665/20160719 Personalized... · © Cloudera, Inc. All rights reserved. 9 2006 2008 2009 2010 2011 2012 2013 Core Hadoop

11 © Cloudera, Inc. All rights reserved.

Title

Page 12: Big Data for Personalized Health & Genomicsfiles.meetup.com/19290665/20160719 Personalized... · © Cloudera, Inc. All rights reserved. 9 2006 2008 2009 2010 2011 2012 2013 Core Hadoop

12 © Cloudera, Inc. All rights reserved. Courtesy Cloudian

Page 13: Big Data for Personalized Health & Genomicsfiles.meetup.com/19290665/20160719 Personalized... · © Cloudera, Inc. All rights reserved. 9 2006 2008 2009 2010 2011 2012 2013 Core Hadoop

13 © Cloudera, Inc. All rights reserved. Courtesy Cloudian

Page 14: Big Data for Personalized Health & Genomicsfiles.meetup.com/19290665/20160719 Personalized... · © Cloudera, Inc. All rights reserved. 9 2006 2008 2009 2010 2011 2012 2013 Core Hadoop

14 © Cloudera, Inc. All rights reserved.

What are we seeing?

Page 15: Big Data for Personalized Health & Genomicsfiles.meetup.com/19290665/20160719 Personalized... · © Cloudera, Inc. All rights reserved. 9 2006 2008 2009 2010 2011 2012 2013 Core Hadoop

15 © Cloudera, Inc. All rights reserved.

Annotation Data

• Each researcher today has to go to multiple genomic search engines to find variants, annotations, other

• Much better to have this data in the same integrated repository as your data—easier & faster

• Cloudera Omics Accelerator integration of public databases is designed to save researchers’ time

Page 16: Big Data for Personalized Health & Genomicsfiles.meetup.com/19290665/20160719 Personalized... · © Cloudera, Inc. All rights reserved. 9 2006 2008 2009 2010 2011 2012 2013 Core Hadoop

16 © Cloudera, Inc. All rights reserved.

The Genomic Analytic Pipeline & Cloudera Omics

Upstream Downstream Whole

Exome/ Genome

Genotyping

Alignment Annotation

Analysis

Velvet, BWA…n

GATK, Samtools…n

VEP, SnpEff…n

Se

qu

en

cin

g

Multiple Public

Databases

Internal Clinical

Integrated precision medicine repository

Page 17: Big Data for Personalized Health & Genomicsfiles.meetup.com/19290665/20160719 Personalized... · © Cloudera, Inc. All rights reserved. 9 2006 2008 2009 2010 2011 2012 2013 Core Hadoop

17 © Cloudera, Inc. All rights reserved.

Baylor moves to Cloudera Enterprise to embark on their precision medicine journey

Page 18: Big Data for Personalized Health & Genomicsfiles.meetup.com/19290665/20160719 Personalized... · © Cloudera, Inc. All rights reserved. 9 2006 2008 2009 2010 2011 2012 2013 Core Hadoop

18 © Cloudera, Inc. All rights reserved.

Broad Institute’s industry standard GATK pipeline’s next version will be Spark-based, over 20,000 global users may migrate to Spark

Page 19: Big Data for Personalized Health & Genomicsfiles.meetup.com/19290665/20160719 Personalized... · © Cloudera, Inc. All rights reserved. 9 2006 2008 2009 2010 2011 2012 2013 Core Hadoop

19 © Cloudera, Inc. All rights reserved.

Cloudera will be driving adoption of big data at precision medicine labs around the US, including custom collaborations

Page 20: Big Data for Personalized Health & Genomicsfiles.meetup.com/19290665/20160719 Personalized... · © Cloudera, Inc. All rights reserved. 9 2006 2008 2009 2010 2011 2012 2013 Core Hadoop

20 © Cloudera, Inc. All rights reserved.

The new ‘omic ‘apps’ use big data stack

Page 21: Big Data for Personalized Health & Genomicsfiles.meetup.com/19290665/20160719 Personalized... · © Cloudera, Inc. All rights reserved. 9 2006 2008 2009 2010 2011 2012 2013 Core Hadoop

21 © Cloudera, Inc. All rights reserved.

Page 22: Big Data for Personalized Health & Genomicsfiles.meetup.com/19290665/20160719 Personalized... · © Cloudera, Inc. All rights reserved. 9 2006 2008 2009 2010 2011 2012 2013 Core Hadoop

22 © Cloudera, Inc. All rights reserved.

Embrace • Genomics is part of our lives • Precision medicine is just medicine • Data sizes of the future are here Open Your Mind, Open Your Mission • Decide to help human health

Your Next Step • Find the practitioners, their data, their tools • Get started with Hadoop, Spark and more • Expand their data universe, be a hero

Your next step?

Page 23: Big Data for Personalized Health & Genomicsfiles.meetup.com/19290665/20160719 Personalized... · © Cloudera, Inc. All rights reserved. 9 2006 2008 2009 2010 2011 2012 2013 Core Hadoop

23 © Cloudera, Inc. All rights reserved.

Thank you