bicf showcase: single cell rna-seq analysis...cell ranger overview • cell ranger is a set of...

19
BICF Showcase: Single Cell RNA-seq Analysis November 28, 2018 BioHPC Training Session Jeon Lee

Upload: others

Post on 28-Jan-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

  • BICF Showcase: Single Cell RNA-seq Analysis

    November 28, 2018 BioHPC Training Session

    Jeon Lee

  • Agenda

    • Library Analysis with Cell Ranger

    • Secondary Analysis with R studio

  • Library Analysis with Cell Ranger

  • Cell Ranger Overview

    • Cell Ranger is a set of analysis pipelines that process Chromium single-cell RNA-seqoutput to align reads, generate feature-barcode matrices and perform clustering and gene expression analysis.

    • Cell Ranger includes four pipelines:– cellranger mkfastq demultiplexes raw base call (BCL) files generated by Illumina sequencers into

    FASTQ files.

    – cellranger count takes FASTQ files and performs alignment (STAR), filtering, barcode counting, and UMI counting, etc.

    – cellranger aggr aggregates outputs from multiple runs of cellranger count, normalizing those runs to the same sequencing depth and then recomputing the feature-barcode matrices.

    – cellranger reanalyze takes feature-barcode matrices produced by cellranger count or cellranger aggr and reruns the dimensionality reduction, clustering, and gene expression analysis using tunable parameter settings.

  • Cell Ranger Workflow

    • It always starts with running cellranger mkfastq on each flowcell. The exact steps of the workflow vary depending on how many samples, GEM wells, and flowcellsyou have.

    • One Sample, One GEM Well, One Flowcell– In this example you have one sample that is processed through one GEM well (a set of

    partitioned cells from a single 10x Chromium) and sequenced on one flowcell. – In this case you would generate FASTQs using cellranger mkfastq, and run cellranger

    count.

  • Cellranger Workflow (Contd.)

    • One Sample, One GEM well, Multiple Flowcells– In this example you have one sample that is processed through one GEM well

    then you generate one library which is sequenced across multiple flowcells. This may be done to increase sequencing depth, for example.

    – In this case all of the reads can be combined in a single instance of the cellranger count pipeline.

  • Cellranger Workflow (Contd.)

    • One Sample, Multiple GEM Wells, One Flowcell– If you prepared multiple libraries from the same sample (technical replicates, for

    example), then each one should be run through a separate instance of cellranger count.

    – Once those are completed, you can perform a combined analysis using cellranger aggr.

  • Cellranger Workflow (Contd.)

    • Multiple Samples, Multiple GEM Well, One Flowcell– In this example you have multiple samples that are processed through multiple

    GEM wells which generate multiple libraries and are pooled onto one flowcell.

    – In this case, after demultiplexing, you must run cellranger count separately for each GEM well. Then you can aggregate them with a single instance of cellranger aggr.

  • Cellranger mkfastq

    • It is a 10x-enhanced wrapper around Illumina bcl2fastq, which demultiplexes BCL files from a sequencer into FASTQs for analysis.

    • It recognizes two file formats for describing samples: a simple, three-column CSV format, and the Illumina Experiment Manager (IEM) sample sheet format used by bcl2fastq.– The simple CSV samplesheet is recommended for most sequencing experiments.– Its three columns are:

  • Cellranger mkfastq (Contd.)

    • Submit a cluster batch job using a script file– An example of a simple simple CSV samplesheet (‘cellranger-tiny-bcl-

    simple-1.2.0.csv’)

    – An example of a script file (‘run_cellranger_mkfastq.sh’)

    – An example of job submission

    Lane,Sample,Index1,test_sample,SI-P03-C9

    #!/bin/bashmodule load cellranger/2.0.2cellranger mkfastq --run=cellranger-tiny-bcl-1.2.0 --csv=cellranger-tiny-bcl-simple-1.2.0.csv

    $ sbatch run_cellranger_mkfastq.sh

  • Cellranger mkfastq (Contd.)

    • Checking FASTQ output– Once the pipeline has successfully completed, the output can be found in a

    new folder named with the value you provided in the --id option (if not specified, defaults to the name of the flowcell):

    – Assuming the flowcell name was ‘H35KCBCXY’ (--id was not specified) and ‘Sample’ was set to ‘test_sample’, you can find fastq files under

    $ cd ./H35KCBCXY/outs/fastq_path/H35KCBCXY/test_sample$ ls

    test_sample_S1_L001_I1_001.fastq test_sample_S1_L001_R1_001.fastq test_sample_S1_L001_R21_001.fastq

  • Cellranger count

    • It takes FASTQ files and performs alignment, filtering, and UMI counting. It can take input from multiple sequencing runs on the same library.

    • Assuming that we use a data set generated from mouse cells and the data set is available at /home2/s16xxxx/SingleCell/MBSK-nLU868/outs/fastq_path, a shell script should be like this:

    where --id is a unique run ID string; --fastqs specifies path of the FASTQ directory;--transcriptom specifies path to the Cell Ranger compatible transcriptome reference;--expect-cells is optional flag where we can specify expected number of recovered cells

    #!/bin/bashcellranger count --id=sample345 \--transcriptome=/project/apps_database/cellranger/refdata-cellranger-mm10-1.2.0 \--fastqs=/home2/s16xxxx/SingleCell/MBSK-nLU868/outs/fastq_path \--expect-cells=3000

  • Cellranger count (Contd.)

    • The subdirectory named “outs” will contain the main pipeline output files. Detailed description of output is:

  • Cellranger aggr

    • For large studies involving multiple GEM wells, we run cellranger count on FASTQ data from each of the GEM wells, and then pool the results using cellranger aggr.– An example of a shell script

    • 1) mapped: (default) Subsample reads from higher-depth GEM wells until they all have an equal number of confidently mapped reads per cell; 2) none: Do not normalize at all.

    – An example of the csv file (‘aggregate_libraries.csv’)

    #!/bin/bashmodule load cellranger/2.1.1 cellranger aggr --id=aggregate --csv=aggregate_libraries.csv --normalize=mapped

    library_id,molecule_h5BRAH-4f7l7J,/home2/…/BRAH-4f7l7J/outs/molecule_info.h5BRAH-kZg4EN,/home2/…/BRAH-kZg4EN/outs/molecule_info.h5BRAH-wLga2G,/home2/…/BRAH-wLga2G/outs/molecule_info.h5

  • Secondary Analysis with R studio

  • aggr

    Project Overview

    • Three scRNA-seq experiments in different disease stages – KIC (p48Cre;LSL-KrasG12D;Cdkn2af/f) mice model– Normal, early_KIC, invasive_KIC

    • Any similarity between clusters of different experiments?

    Normal mkfastq count

    mkfastq count

    mkfastq count

    Early

    Invasive

    Fibroblast-1Fibroblast-2 Fibroblast-1

    Fibroblast-2Fibroblast-3

    Fibroblast-AFibroblast-B

  • OnDemand Rstudio Demo

    • Example_Cluster_Comparison_final.html

    Example_Cluster_Comparison_final.html

  • BICF HelpDesk

    • We provide free consultations through our Help Desk. We offer three modes of interaction:– Email: BICF HelpDesk

    – Drop-in Sessions: E4.350, 3 drop-in sessions a week• 9 AM - 11 AM, Monday and Friday

    • 1 PM - 3 PM, Wednesday

    – Phone: 214-645-1707

    • Additionally, you set up an in-person appointment using our email or phone services.

  • Thank you

    Q & A