nutrigenomics metadata 12_1_16_v4 (1)

57
Ben Busby, Ph.D. Genomics Outreach Coordinator NCBI [email protected] Recording Informative Metadata for Nutrigenomics! nvironmental Variation in the Rising Era of Individual Genome Sequen

Upload: ben-busby

Post on 12-Apr-2017

101 views

Category:

Science


0 download

TRANSCRIPT

Page 1: Nutrigenomics metadata 12_1_16_v4 (1)

Ben Busby, Ph.D.Genomics Outreach Coordinator

[email protected]

Recording Informative Metadata for Nutrigenomics!

Environmental Variation in the Rising Era of Individual Genome Sequence

Page 2: Nutrigenomics metadata 12_1_16_v4 (1)

Review of terminology and conceptsNext Generation Sequencing

Graphic Credit: Spencer Martin, UBC

Page 3: Nutrigenomics metadata 12_1_16_v4 (1)

Review of terminology and conceptsHow Genomes are Mapped and Assembled

© Martine Zilversmit 2013

Page 4: Nutrigenomics metadata 12_1_16_v4 (1)

http://1.usa.gov/1J1xmYs

NCBI NGS Online Workshop – Available on the NCBI YouTube Channel!

Review of terminology and conceptsHow Genomes are Mapped and Assembled

Page 5: Nutrigenomics metadata 12_1_16_v4 (1)

My View of Data Transfer Principles• Metadata Search

• Rapid NoSQL (for now)• Integration• Non-ambiguous identifiers

• Transferring Small amounts of Data• Data still gets transferred in the cloud• Underlying structure• Finding specific data from validated formats

• Democratization of Data• Rapid comparison by domain experts

• Reporting• Metrics to report data upload and [unique IP] download of datasets• Post-publication User Review

• The NCBI LinkOut Mechanism as a test suite

Page 6: Nutrigenomics metadata 12_1_16_v4 (1)

BioProject

Page 7: Nutrigenomics metadata 12_1_16_v4 (1)

BioProject

Page 8: Nutrigenomics metadata 12_1_16_v4 (1)

BioProject

Page 9: Nutrigenomics metadata 12_1_16_v4 (1)

BioProject

Page 10: Nutrigenomics metadata 12_1_16_v4 (1)

BioSample

Page 11: Nutrigenomics metadata 12_1_16_v4 (1)

BioSample

Page 12: Nutrigenomics metadata 12_1_16_v4 (1)

BioSample

Page 13: Nutrigenomics metadata 12_1_16_v4 (1)

SRA

Page 14: Nutrigenomics metadata 12_1_16_v4 (1)
Page 15: Nutrigenomics metadata 12_1_16_v4 (1)
Page 16: Nutrigenomics metadata 12_1_16_v4 (1)

dbGaP

Page 17: Nutrigenomics metadata 12_1_16_v4 (1)

dbGaP

2007 2008 2009 2010 2011 2012 2013 2014 2015

14,20153,216

139,311

374,464

485,727

566,181

660,665

876,849

1,002,935

Subjects

Page 18: Nutrigenomics metadata 12_1_16_v4 (1)
Page 19: Nutrigenomics metadata 12_1_16_v4 (1)

Investigation of NGS:SRA BLAST!

Page 20: Nutrigenomics metadata 12_1_16_v4 (1)

Investigation of NGS:MagicBLAST!

Page 21: Nutrigenomics metadata 12_1_16_v4 (1)

Domain-specific SRA and BioSample Submission Templates

Page 22: Nutrigenomics metadata 12_1_16_v4 (1)

Domain-specific Bulk SRA and BioSample Submission

Page 23: Nutrigenomics metadata 12_1_16_v4 (1)

GenBank and RefSeq!

Page 24: Nutrigenomics metadata 12_1_16_v4 (1)

24

Submission to GenBank!

Page 25: Nutrigenomics metadata 12_1_16_v4 (1)

Superbankit!

Page 26: Nutrigenomics metadata 12_1_16_v4 (1)

Reporting

Page 27: Nutrigenomics metadata 12_1_16_v4 (1)

Food Borne Pathogens

Page 28: Nutrigenomics metadata 12_1_16_v4 (1)

Food Borne Pathogens

Page 29: Nutrigenomics metadata 12_1_16_v4 (1)

Food Borne Pathogens

Page 30: Nutrigenomics metadata 12_1_16_v4 (1)

Where to Get More Information!

Page 31: Nutrigenomics metadata 12_1_16_v4 (1)

Where to Get More Information!

Page 32: Nutrigenomics metadata 12_1_16_v4 (1)

The Future

Page 33: Nutrigenomics metadata 12_1_16_v4 (1)

The Future (in my opinion)

Page 34: Nutrigenomics metadata 12_1_16_v4 (1)

The Future (in my opinion)…

Is already here

Page 35: Nutrigenomics metadata 12_1_16_v4 (1)

Ontological Standardization

Page 36: Nutrigenomics metadata 12_1_16_v4 (1)

Ontological Standardization

Page 37: Nutrigenomics metadata 12_1_16_v4 (1)

Ontological Standardization

Page 38: Nutrigenomics metadata 12_1_16_v4 (1)

Integration into a Larger Data Discovery Framework

BD2K - bioCADDIE

Page 39: Nutrigenomics metadata 12_1_16_v4 (1)

Integration into a Larger Data Discovery Framework

Page 40: Nutrigenomics metadata 12_1_16_v4 (1)

Integration into a Larger Data Discovery FrameworkExample: GOLD (JGI)

Page 41: Nutrigenomics metadata 12_1_16_v4 (1)

E-Utilities (Eutils)

Video available at:http://www.ncbi.nlm.nih.gov/education/webinars/

Page 42: Nutrigenomics metadata 12_1_16_v4 (1)

42

E-Utilities (Eutils)

Page 43: Nutrigenomics metadata 12_1_16_v4 (1)

43

Introducing… Entrez DirectThe E-utilities on the UNIX

command line

esearch –db gene –query “foxp2[gene] AND human[orgn]” | \

elink –target protein –name gene_protein_refseq | \

efetch –format fasta

ftp.ncbi.nlm.nih.gov/entrez/entrezdirect/

Page 44: Nutrigenomics metadata 12_1_16_v4 (1)

44

Moving from FTP-scraping cron jobs to on-demand APIs

Page 45: Nutrigenomics metadata 12_1_16_v4 (1)

45

Edirect Cookbook (DRAFT)

Page 46: Nutrigenomics metadata 12_1_16_v4 (1)

46

Generating apps that work with our APIs and Data Structures,

and Improve Metadata:

NCBI Hackathons!

Page 47: Nutrigenomics metadata 12_1_16_v4 (1)

2015-2016, 8 Hackathons

Many Functional Software Products 3 Days

Page 48: Nutrigenomics metadata 12_1_16_v4 (1)

An Educational Resource for RNAseq

Available to

anyone on AWS

Page 49: Nutrigenomics metadata 12_1_16_v4 (1)

Part of an Online Workshop

First 5 lectures

now available

on

Page 50: Nutrigenomics metadata 12_1_16_v4 (1)

HackathonsJanuary 2016 6 functional software products 3 days

Page 51: Nutrigenomics metadata 12_1_16_v4 (1)

Hackathons

Page 52: Nutrigenomics metadata 12_1_16_v4 (1)

Hackathons

Page 53: Nutrigenomics metadata 12_1_16_v4 (1)

In April, July, August and

October 2016

we built on

those projects .

Page 54: Nutrigenomics metadata 12_1_16_v4 (1)

Finding immunogenic peptides from single RNA-seq samples

Page 55: Nutrigenomics metadata 12_1_16_v4 (1)

DangerTrackDifficult to assess regions

Combined score is the average of SVs, mappability, GC..

NCBI region list

Encode blacklist

Page 56: Nutrigenomics metadata 12_1_16_v4 (1)

Get More Info!

On Twitter @NCBI@DCGenomics

Page 57: Nutrigenomics metadata 12_1_16_v4 (1)

In 2017 we will Build on Those Projects!

Biomedical Informatics Hackathon January 9th – 11th NIH Campus, Bethesda!

NCBI Genomics Hackathon March 20-22nd NIH Campus, Bethesda