my coge comparing our genomes. background and introduction decreases in sequencing costs, coupled...

10
myCoGe Comparing our genomes

Upload: octavia-golden

Post on 14-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: My CoGe Comparing our genomes. Background and Introduction  Decreases in sequencing costs, coupled with increases in speed have paved the way for “Personal

myCoGe

Comparing our genomes

Page 2: My CoGe Comparing our genomes. Background and Introduction  Decreases in sequencing costs, coupled with increases in speed have paved the way for “Personal

Background and Introduction

Decreases in sequencing costs, coupled with increases in speed have paved the way for “Personal Genomics”

Companies now providing sequencing include: 23andMe ($99) AncestryDNA ($99) CompleteGenomics

($5000) Counsyl ($1000) Ubiome ($89-$400) Genelex …and more!

Page 3: My CoGe Comparing our genomes. Background and Introduction  Decreases in sequencing costs, coupled with increases in speed have paved the way for “Personal

Huge set of data provides lots of promise for researchers. 600k of 23andMe’s 800k

customers have consented to using data for research.

Multiple sources now provide means for individuals to share their genetics and health histories with researchers. i.e. Personal Genome

Project, OpenHuman

Unfortunately, data from different sources cannot be directly compared.

Background and Introduction

Page 4: My CoGe Comparing our genomes. Background and Introduction  Decreases in sequencing costs, coupled with increases in speed have paved the way for “Personal

Goal of myCoGe Data Integration Pipeline

Provide a mechanism for automated retrieval of publically available genomic experiment datasets for import into CoGe.

Provide the necessary tools for converting raw experiment files to formats accepted by CoGe.

Provide tools for converting experiments to utilize the same reference genome.

What is myCoGe?

Ultimate Goal of myCoGe

Provide a powerful framework of tools and datasets to allow for analyses into how variation affects function in human genomes.

Provide a useful toolbox for individuals to investigate their own, personal genetic data.

Page 5: My CoGe Comparing our genomes. Background and Introduction  Decreases in sequencing costs, coupled with increases in speed have paved the way for “Personal

myCoGe Data Integration Conceptual Pipeline

ReviewDownload Convert LoadIdentify

Page 6: My CoGe Comparing our genomes. Background and Introduction  Decreases in sequencing costs, coupled with increases in speed have paved the way for “Personal

Operational File Structure

Page 7: My CoGe Comparing our genomes. Background and Introduction  Decreases in sequencing costs, coupled with increases in speed have paved the way for “Personal

myCoGe Data Integration Full Pipeline

Page 8: My CoGe Comparing our genomes. Background and Introduction  Decreases in sequencing costs, coupled with increases in speed have paved the way for “Personal

Fun Facts

Lines of Code

Slowest Process: Loading 20gig reference SNP file - ~4min

Convert 900,000 SNPs from reference file: 5-30seconds

Speed Benchmarks

Initiate : 123 lines.

myCoGe: 692 lines.

Finalize: 11 lines.

Execute_myCoGe: 3 lines.

SNPScraper: 59 lines.

Total: 888

Page 9: My CoGe Comparing our genomes. Background and Introduction  Decreases in sequencing costs, coupled with increases in speed have paved the way for “Personal

Initial Execution Complete pipeline was executed

Friday, May 1st.

Initial query of PGP obtained 579 potential experiments and associated metadata.

Complications PGP servers slow, largely unresponsive

Through weekend, just under 100 experiments were able to be downloaded.

Of this, 79 yielded good results.

CoGe API Load Experiment not functional Code for loading is complete, but

CoGe returns authentication error.

Reference genome chromosome names are NCBI IDs instead of numbers.

Page 10: My CoGe Comparing our genomes. Background and Introduction  Decreases in sequencing costs, coupled with increases in speed have paved the way for “Personal

Future Directions

myCoGe Data-Integration Pipeline Functional CoGe API loading. Increased stability in face of poor connections. Expanded file types. Expanded experiment sources. Automated execution.

myCoGe Web-based personal data integration Integrated comparison tools

Gene model annotations Functional and expression experiments Full-genome sequencing