![Page 1: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’](https://reader033.vdocuments.site/reader033/viewer/2022050606/5fadcc84a5c563498e2ab72c/html5/thumbnails/1.jpg)
DNase I Seq data Analysis Strategy
Dragon Star 2013 QianQin 同济大学
![Page 2: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’](https://reader033.vdocuments.site/reader033/viewer/2022050606/5fadcc84a5c563498e2ab72c/html5/thumbnails/2.jpg)
WorkflowMapping(BWA/Bow8e)
Reads filtering and format(SAMTOOLS /Picard)
Peaks Calling (MACS/hotspot)
Pileup(Convert to bigwiggle) Peaks BED 1 Peaks BED 2
1. Sampling down by mappable reads 2. Scale mappable reads
1. Data comparison(bedops, BEDTOOLS) 2. Union BED 3. Mo8f discovery
Correla8on
Filtering BedGraph, BED(BEDTOOLS, bedClip)
QC qrqc, FastQC
![Page 3: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’](https://reader033.vdocuments.site/reader033/viewer/2022050606/5fadcc84a5c563498e2ab72c/html5/thumbnails/3.jpg)
Warm up
![Page 4: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’](https://reader033.vdocuments.site/reader033/viewer/2022050606/5fadcc84a5c563498e2ab72c/html5/thumbnails/4.jpg)
Examples on DHS
He, H. H., Meyer, C. A., Chen, M. W., Jordan, V. C., Brown, M., & Liu, X. S. (2012). Genome research, 22(6), 1015–25. doi:10.1101/gr.133280.111
Neph, S., Vierstra, J., Stergachis, A. B., Reynolds, A. P., Haugen, E., Vernot, B., Thurman, R. E., et al. (2012). Nature, 489(7414), 83–90. doi:10.1038/nature11212
![Page 5: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’](https://reader033.vdocuments.site/reader033/viewer/2022050606/5fadcc84a5c563498e2ab72c/html5/thumbnails/5.jpg)
Uncompress BAM to Fastq
• Single End data bamToFastq –i path_to_bam –fq output.fastq
-‐i input bam files -‐fq output fastq files -‐fq2 pair end
![Page 6: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’](https://reader033.vdocuments.site/reader033/viewer/2022050606/5fadcc84a5c563498e2ab72c/html5/thumbnails/6.jpg)
Format instruction• FASTQ: – hdp://en.wikipedia.org/wiki/FASTQ_format
• SAM/BAM • BED, BedGraph, BigBed • Wiggle, BigWiggle • narrowPeak, broadPeak • bed.starch
hdps://genome.ucsc.edu/FAQ/FAQformat.html
hdp://code.google.com/p/bedops/wiki/starchAndUnstarch
![Page 7: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’](https://reader033.vdocuments.site/reader033/viewer/2022050606/5fadcc84a5c563498e2ab72c/html5/thumbnails/7.jpg)
SAM/BAM file instruction• BAM is compressed SAM • FLAGS for SE: – 0 for posi8ve strand, 16 for nega8ve strand, 4 for unmapped
• FLAGS for PE: – R mate reverse strand, r read reverse strand – 147 pair2 – strand, 99 pair 1 + strand – 83 pair1 – strand, 163 pair2 + strand
• Common FLAG: – NM for mismatch level – XT for custom tags
hdp://genome.sph.umich.edu/wiki/SAM
![Page 8: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’](https://reader033.vdocuments.site/reader033/viewer/2022050606/5fadcc84a5c563498e2ab72c/html5/thumbnails/8.jpg)
Tips on shell
du –h file du –sh . grep A input.fastq grep 0 input.fastq
cut -‐f 5 input.sam cut -‐f 3,4 input.sam | uniq | wc –l cut –f 3,4 input.sam | grep chr21 | wc -‐l
![Page 9: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’](https://reader033.vdocuments.site/reader033/viewer/2022050606/5fadcc84a5c563498e2ab72c/html5/thumbnails/9.jpg)
Task 1: get reads mapping location
![Page 10: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’](https://reader033.vdocuments.site/reader033/viewer/2022050606/5fadcc84a5c563498e2ab72c/html5/thumbnails/10.jpg)
Bowtie/Bowtie2
• Index genome
• Single End
bow8e-‐build chr21.fa chr21 bow8e2-‐build chr21.fa chr21
bow4e2 chr21 input.fastq -‐S output.sam bow4e chr21 input.fastq -‐S output.sam
![Page 11: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’](https://reader033.vdocuments.site/reader033/viewer/2022050606/5fadcc84a5c563498e2ab72c/html5/thumbnails/11.jpg)
BWA• Index genome
• Mapping
bwa index -‐a bwtsw chr21.fa
bwa aln -‐t 4 chr21.fa input.fastq -‐f output.fai bwa samse -‐f output.sam chr21.fa output.fai input.fastq
![Page 12: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’](https://reader033.vdocuments.site/reader033/viewer/2022050606/5fadcc84a5c563498e2ab72c/html5/thumbnails/12.jpg)
Task 2: Alignment conversion and
mapping statistics
![Page 13: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’](https://reader033.vdocuments.site/reader033/viewer/2022050606/5fadcc84a5c563498e2ab72c/html5/thumbnails/13.jpg)
Samtools / Picard for Conversion
Convert SAM to BAM
Convert BAM to SAM
samtools view -‐h input.bam -‐o output.sam samtools view -‐X input.bam -‐o output.sam samtools view -‐x input.bam -‐o output.sam
samtools view -‐bS input.sam -‐o output.bam samtools sort input.bam output_sorted samtools merge merge.bam input1.bam input2.bam
![Page 14: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’](https://reader033.vdocuments.site/reader033/viewer/2022050606/5fadcc84a5c563498e2ab72c/html5/thumbnails/14.jpg)
Samtools / Picard for reads filter and statistics
samtools flagstat input.bam
samtools view -‐bq 1 input.bam > output.bam
Get reliable aligned reads
Mapping sta8s8cs
![Page 15: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’](https://reader033.vdocuments.site/reader033/viewer/2022050606/5fadcc84a5c563498e2ab72c/html5/thumbnails/15.jpg)
BEDTOOLS/BEDOPS for reads format conversion
bamToBed -‐i input.bam > input.bed
bedops -‐u input1.bed input2.bed > output.bedEquals
cat input1.bed input2.bed | sort-‐bed -‐ > output.bed
Convert BAM to BED
Merge BED files
![Page 16: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’](https://reader033.vdocuments.site/reader033/viewer/2022050606/5fadcc84a5c563498e2ab72c/html5/thumbnails/16.jpg)
Task 3: Predict open chromatin regions
![Page 17: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’](https://reader033.vdocuments.site/reader033/viewer/2022050606/5fadcc84a5c563498e2ab72c/html5/thumbnails/17.jpg)
Peaks calling tools
• MACS14/2 – hdps://github.com/taoliu/MACS/ – Built-‐in Cistrome, user-‐friendly – Support Pair end mode
• Hotspot – Need shell and Linux opera8on experience – Largely dependency – hdp://www.uwencode.org/proj/hotspot/
![Page 18: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’](https://reader033.vdocuments.site/reader033/viewer/2022050606/5fadcc84a5c563498e2ab72c/html5/thumbnails/18.jpg)
MACS14
macs14 -‐t test.bam -‐n test Rscript test_model.r ## model image
macs14 -‐-‐keep-‐dup all -‐t test.bam -‐n test
Keep duplicates or not
Model failed
macs14 -‐-‐keep-‐dup all -‐t test.bam -‐n test -‐-‐nomodel -‐-‐shiFsize 73
![Page 19: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’](https://reader033.vdocuments.site/reader033/viewer/2022050606/5fadcc84a5c563498e2ab72c/html5/thumbnails/19.jpg)
MACS2• Peaks calling – macs2 callpeak -‐t test.sam -‐n test – macs2 callpeak -‐-‐nomodel -‐-‐shimsize 73 -‐t test.sam -‐n test
• Down sampling – macs2 randsample -‐t test.sam -‐n 5000 -‐-‐seed 25 -‐o test.bed
• Filter duplicates – macs2 filterdup -‐i test.bam -‐o test.bed
• Pileup – macs2 pileup -‐i test.bam –extsize 3 -‐o test.bed – sort -‐k1,1 -‐k2,2 test.bed > sort.bed
![Page 20: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’](https://reader033.vdocuments.site/reader033/viewer/2022050606/5fadcc84a5c563498e2ab72c/html5/thumbnails/20.jpg)
Task 4: Replicates consistency
![Page 21: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’](https://reader033.vdocuments.site/reader033/viewer/2022050606/5fadcc84a5c563498e2ab72c/html5/thumbnails/21.jpg)
bedtools/bedops for comparison
bedops –i input1.bed input2.bed > output.bed bedtools intersect –a input1.bed -‐b input2.bed > output.bed
bedops –e input1.bed input2.bed intersectBed –a -‐u input1.bed input2.bed
Get input1.bed overlapped regions only
Get intersec8on regions
Get input1.bed complementary regions
bedops –d input1.bed input2.bed intersectBed –v –a input1.bed –b input2.bed
![Page 22: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’](https://reader033.vdocuments.site/reader033/viewer/2022050606/5fadcc84a5c563498e2ab72c/html5/thumbnails/22.jpg)
Task 5: data visualization, annotation and
Motif discovery
![Page 23: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’](https://reader033.vdocuments.site/reader033/viewer/2022050606/5fadcc84a5c563498e2ab72c/html5/thumbnails/23.jpg)
MDSeqpos
MDSeqPos.py input.bed -‐d -‐m cistrome.xml -‐p 0.05 hg19 -‐s hs
-‐p p value -‐s species -‐d denovo or not -‐m mo8f databases, transfac.xml, cistrome.xml
sort -‐r -‐g -‐k 5 peaks.bed > input.bed
Get most accessible chroma8n regions
Mo8f analysis
![Page 24: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’](https://reader033.vdocuments.site/reader033/viewer/2022050606/5fadcc84a5c563498e2ab72c/html5/thumbnails/24.jpg)
Data visualization and Cistrome application annotation
IGV • Set data ranges • Auto scale • Find most enrichment regions • Load wiggle and peaks BED
RegPoten8al.py -‐t test_peaks.bed -‐g /mnt/Storage/data/sync_cistrome_lib/ceaslib/GeneTable/hg19 -‐n test -‐d 10000
Get open chroma8n regions nearby genes
![Page 25: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’](https://reader033.vdocuments.site/reader033/viewer/2022050606/5fadcc84a5c563498e2ab72c/html5/thumbnails/25.jpg)
Task summary
• Get Fastq • Mapping • Get proper format • Peaks calling • Comparison of replicates peaks • Data visualiza8on and mo8f analysis
![Page 26: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’](https://reader033.vdocuments.site/reader033/viewer/2022050606/5fadcc84a5c563498e2ab72c/html5/thumbnails/26.jpg)
师资队伍
曹志伟 江赐忠 张勇
全职教授 �
Shirley Liu (Harvard) � Zhiping Weng
(UMass)
Wei Li (Baylor)
海外 �
讲座教授 �
千人计划
973首席科学家上海市浦江人才上海市东方学者计划上海市曙光计划
教育部新世纪优秀人才
上海市科委科技启明星计划教育部新世纪优秀人才
兼职教授 �
李亦学
协助引进 �
张帆刘雷
千人计划
![Page 27: DNase I Seq data Analysis Strategy - Fudan Universityadmis.fudan.edu.cn/ds2013/sta/day4-s.pdf · Tips on shell du’–h’file’ du– sh. grepA’ input.fastq’ grep0 input.fastq’](https://reader033.vdocuments.site/reader033/viewer/2022050606/5fadcc84a5c563498e2ab72c/html5/thumbnails/27.jpg)
Welcome join us !