my coge comparing our genomes. background and introduction decreases in sequencing costs, coupled...
TRANSCRIPT
myCoGe
Comparing our genomes
Background and Introduction
Decreases in sequencing costs, coupled with increases in speed have paved the way for “Personal Genomics”
Companies now providing sequencing include: 23andMe ($99) AncestryDNA ($99) CompleteGenomics
($5000) Counsyl ($1000) Ubiome ($89-$400) Genelex …and more!
Huge set of data provides lots of promise for researchers. 600k of 23andMe’s 800k
customers have consented to using data for research.
Multiple sources now provide means for individuals to share their genetics and health histories with researchers. i.e. Personal Genome
Project, OpenHuman
Unfortunately, data from different sources cannot be directly compared.
Background and Introduction
Goal of myCoGe Data Integration Pipeline
Provide a mechanism for automated retrieval of publically available genomic experiment datasets for import into CoGe.
Provide the necessary tools for converting raw experiment files to formats accepted by CoGe.
Provide tools for converting experiments to utilize the same reference genome.
What is myCoGe?
Ultimate Goal of myCoGe
Provide a powerful framework of tools and datasets to allow for analyses into how variation affects function in human genomes.
Provide a useful toolbox for individuals to investigate their own, personal genetic data.
myCoGe Data Integration Conceptual Pipeline
ReviewDownload Convert LoadIdentify
Operational File Structure
myCoGe Data Integration Full Pipeline
Fun Facts
Lines of Code
Slowest Process: Loading 20gig reference SNP file - ~4min
Convert 900,000 SNPs from reference file: 5-30seconds
Speed Benchmarks
Initiate : 123 lines.
myCoGe: 692 lines.
Finalize: 11 lines.
Execute_myCoGe: 3 lines.
SNPScraper: 59 lines.
Total: 888
Initial Execution Complete pipeline was executed
Friday, May 1st.
Initial query of PGP obtained 579 potential experiments and associated metadata.
Complications PGP servers slow, largely unresponsive
Through weekend, just under 100 experiments were able to be downloaded.
Of this, 79 yielded good results.
CoGe API Load Experiment not functional Code for loading is complete, but
CoGe returns authentication error.
Reference genome chromosome names are NCBI IDs instead of numbers.
Future Directions
myCoGe Data-Integration Pipeline Functional CoGe API loading. Increased stability in face of poor connections. Expanded file types. Expanded experiment sources. Automated execution.
myCoGe Web-based personal data integration Integrated comparison tools
Gene model annotations Functional and expression experiments Full-genome sequencing