genomedata, segway and segtools: how to use the segway … · 2016-06-07 · # pytables pip install...

21
Genomedata, Segway and Segtools: How to use the Segway pipeline to store and analyze genomics data sets Max Libbrecht

Upload: others

Post on 12-Aug-2020

16 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Genomedata, Segway and Segtools: How to use the Segway … · 2016-06-07 · # Pytables pip install numpy pip install numexpr pip install cython # Genomedata pip install genomedata

Genomedata, Segway and Segtools: How to use the Segway pipeline to store and analyze

genomics data sets

Max Libbrecht

Page 2: Genomedata, Segway and Segtools: How to use the Segway … · 2016-06-07 · # Pytables pip install numpy pip install numexpr pip install cython # Genomedata pip install genomedata

Genomedata Segway Segtools

Genomedata, Segway and Segtools: How to use the Segway pipeline to store and analyze genomics data sets

Page 3: Genomedata, Segway and Segtools: How to use the Segway … · 2016-06-07 · # Pytables pip install numpy pip install numexpr pip install cython # Genomedata pip install genomedata

Genomedata Segway Segtools

Genomedata, Segway and Segtools: How to use the Segway pipeline to store and analyze genomics data sets

Page 4: Genomedata, Segway and Segtools: How to use the Segway … · 2016-06-07 · # Pytables pip install numpy pip install numexpr pip install cython # Genomedata pip install genomedata
Page 5: Genomedata, Segway and Segtools: How to use the Segway … · 2016-06-07 · # Pytables pip install numpy pip install numexpr pip install cython # Genomedata pip install genomedata

Installing Genomedata# HDF5# Ubuntu/Debian:sudo apt-get install libhdf5-serial-dev hdf5-tools# CentOS/RHEL/Fedora:sudo yum -y install hdf5 hdf5-devel# OpenSUSE:sudo zypper in hdf5 hdf5-devel libhdf5

# Pytablespip install numpypip install numexprpip install cython

# Genomedatapip install genomedata

Page 6: Genomedata, Segway and Segtools: How to use the Segway … · 2016-06-07 · # Pytables pip install numpy pip install numexpr pip install cython # Genomedata pip install genomedata

Loading data into genomedata

genomedata-load-assembly --sizes my_genomedata hg19.sizesgenomedata-open-data my_genomedata my_tracknamezcat input.bedgraph.gz | genomedata-load-data my_genomedata my_tracknamegenomedata-close-data my_genomedata

hg19.sizes:chr1 249250621chr2 243199373chr3 198022430chr4 191154276

Page 7: Genomedata, Segway and Segtools: How to use the Segway … · 2016-06-07 · # Pytables pip install numpy pip install numexpr pip install cython # Genomedata pip install genomedata

Accessing data: command line

$ genomedata-query my_genomedata my_trackname chr1 1000000 1000100fixedStep chrom=chr1 start=10000000.00.00.00.00.00.0...

Page 8: Genomedata, Segway and Segtools: How to use the Segway … · 2016-06-07 · # Pytables pip install numpy pip install numexpr pip install cython # Genomedata pip install genomedata

Accessing data: Python

>>> import genomedata>>> g = genomedata.Genome(“my_genomedata”)>>> g[“chr1”][1000000:1000100, “my_trackname”]array([ 17.89999962, 17.89999962, 17.89999962, 17.89999962, 17.89999962, 17.89999962, 17.89999962, 17.89999962, 17.89999962, 17.89999962], dtype=float32)

Page 9: Genomedata, Segway and Segtools: How to use the Segway … · 2016-06-07 · # Pytables pip install numpy pip install numexpr pip install cython # Genomedata pip install genomedata

Genomedata Segway Segtools

Genomedata, Segway and Segtools: How to use the Segway pipeline to store and analyze genomics data sets

Page 10: Genomedata, Segway and Segtools: How to use the Segway … · 2016-06-07 · # Pytables pip install numpy pip install numexpr pip install cython # Genomedata pip install genomedata

DNase1

H3k36me3

RNA-seq

Annotation1 2 3 2 3 2 000

HMMSeg: Day et al. Bioinformatics, 2007ChromHMM: Ernst, J. and Kellis, M. Nature Biotechnology, 2010

Segway: Hoffman, M et al. Nature Methods, 2012

Semi-automated genome annotation algorithms partition and label the genome on the basis of

functional genomics tracks

Page 11: Genomedata, Segway and Segtools: How to use the Segway … · 2016-06-07 · # Pytables pip install numpy pip install numexpr pip install cython # Genomedata pip install genomedata

Segment label

RNA-seq

H3K27me3

DNase1

hidden random variable

observed random variable11

Semi-automated genome annotation algorithms use dynamic Bayesian network models

Page 12: Genomedata, Segway and Segtools: How to use the Segway … · 2016-06-07 · # Pytables pip install numpy pip install numexpr pip install cython # Genomedata pip install genomedata

Installing Segway# GMTKwget http://melodi.ee.washington.edu/downloads/gmtk/gmtk-1.4.0.tar.gztar -xzvf gmtk-1.4.0.tar.gz./configuremakemake installcd ..

# Segwaypip install segway

Page 13: Genomedata, Segway and Segtools: How to use the Segway … · 2016-06-07 · # Pytables pip install numpy pip install numexpr pip install cython # Genomedata pip install genomedata

Running Segway

segway train my_genomedata my_traindirsegway identify my_genomedata my_traindir my_identifydir

output: my_identifydir/segway.bed.gz

Page 14: Genomedata, Segway and Segtools: How to use the Segway … · 2016-06-07 · # Pytables pip install numpy pip install numexpr pip install cython # Genomedata pip install genomedata

Model parameters

Number of annotation labels--num-labels=25

Number of EM intializations--num-instances=10

Maximum number of EM training iterations--max-train-rounds=100

Page 15: Genomedata, Segway and Segtools: How to use the Segway … · 2016-06-07 · # Pytables pip install numpy pip install numexpr pip install cython # Genomedata pip install genomedata

Input dataInput tracks--track=GM12878_H3K27ac --track=GM12878_H3K4me3 OR --tracks-from=tracks.txt

tracks.txt: GM12878_H3K27ac GM12878_H3K4me3 Genome coordinates--include-coords=coords.bed

coords.bed: chr1    151158060    151658060 chr10    55483812    55983812

--exclude-coords=blacklist.bed

Training minibatch size--minibatch-fraction=0.01

Page 16: Genomedata, Segway and Segtools: How to use the Segway … · 2016-06-07 · # Pytables pip install numpy pip install numexpr pip install cython # Genomedata pip install genomedata

Controlling segment lengths

Downsampling resolution--resolution=10

Long segments prior--prior-strength=1.0

Weight on transition part of the model--segtransition-weight-scale=10

Page 17: Genomedata, Segway and Segtools: How to use the Segway … · 2016-06-07 · # Pytables pip install numpy pip install numexpr pip install cython # Genomedata pip install genomedata

Genomedata Segway Segtools

Genomedata, Segway and Segtools: How to use the Segway pipeline to store and analyze genomics data sets

Page 18: Genomedata, Segway and Segtools: How to use the Segway … · 2016-06-07 · # Pytables pip install numpy pip install numexpr pip install cython # Genomedata pip install genomedata

Installing Segtools

pip install segtools

Page 19: Genomedata, Segway and Segtools: How to use the Segway … · 2016-06-07 · # Pytables pip install numpy pip install numexpr pip install cython # Genomedata pip install genomedata

segtools-signal-distribution measures relationships between annotation labels and signal tracks

H4K20me1H3K79me2H3K36me3

H3K4me3H3K27ac

H3K4me1H3K9me1

H3K27me3

Page 20: Genomedata, Segway and Segtools: How to use the Segway … · 2016-06-07 · # Pytables pip install numpy pip install numexpr pip install cython # Genomedata pip install genomedata

segtools-length-distribution measures segment lengths genome coverage

segtools-length-distribution segway.bed.gz

Page 21: Genomedata, Segway and Segtools: How to use the Segway … · 2016-06-07 · # Pytables pip install numpy pip install numexpr pip install cython # Genomedata pip install genomedata

segtools-aggregation measures associations with other genome annotations

segtools-aggregation --normalize --mode=gene segway.bed.gz gencode.gff