single-cell rna-seq data analysis using chipster - csc · csc –suomalainen tutkimuksen,...
TRANSCRIPT
![Page 1: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/1.jpg)
CSC – Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus
Single-cell RNA-seq data analysisusing Chipster
24.9.2019
Eija Korpelainen, Maria Lehtivaara
![Page 2: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/2.jpg)
Understanding data analysis - why?
• Bioinformaticians might not always be available when needed
• Biologists know their own experiments best
oBiology involved (e.g. genes, pathways, etc)
oPotential batch effects etc
• Allows you to design experiments better less money wasted
• Allows you to discuss more easily with bioinformaticians
8.10.20192
![Page 3: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/3.jpg)
What will I learn?
• Analysis of single cell RNA-seq data
oFind subpopulations (clusters) of cells and marker genes for them
oCompare two samples (e.g. treatment vs control)
o Identify cell types that are present in both samples
oObtain cell type markers that are conserved in both samples
oCompare the samples to find cell-type specific responses to treatment
• How to operate the Chipster software
8.10.20193
![Page 4: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/4.jpg)
Introduction to Chipster
![Page 5: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/5.jpg)
• Provides an easy access to over 400 analysis tools
o Command line tools
o R/Bioconductor packages
• Free, open source software
• What can I do with Chipster?
o analyze and integrate high-throughput data
o visualize data efficiently
o share analysis sessions
o save and share automatic workflows
Chipster
![Page 6: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/6.jpg)
Technical aspects
• Client-server system
o Enough CPU and memory for large analysis jobsoCentralized maintenance
• Easy to install
oClient uses Java Web Starto Server available as a virtual machine image
![Page 7: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/7.jpg)
![Page 8: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/8.jpg)
![Page 9: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/9.jpg)
Mode of operationSelect: data tool category tool run visualize result
![Page 10: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/10.jpg)
When running analysis tools, pay attention to parameters!
• make sure the input files are correctly assigned if there are
multiple files (see below)
• choose the right reference genome
• check especially the bolded parameters
![Page 11: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/11.jpg)
Job manager
• You can run many analysis jobs
at the same time
• Use Job manager to:
oview status
ocancel jobs
oview time
oview parameters
![Page 12: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/12.jpg)
Analysis history is saved automatically -you can add tool source code to reports if needed
![Page 13: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/13.jpg)
Analysis sessions
• Remember to save the analysis session within 3 days
oSession includes all the files, their relationships and metadata (what tool and parameters were used to produce each file)
oSession is a single .zip file
oNote that you can save two sessions of the same data
o one with raw data (FASTQ files)
o one smaller, working version where the FASTQ files are deleted after alignment
• You can save a session locally (= on your computer)
• …and in the cloud
obut note that the cloud sessions are not stored forever!
o If your analysis job takes a long time, you don’t need to keep Chipster open:
oWait that the data transfer to the server has completed (job status = running)
o Save the session in the cloud and close Chipster
oOpen Chipster within 3 days and save the session containing the results
![Page 14: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/14.jpg)
Workflow panel
• Shows the relationships of the files
• You can move the boxes around, and zoom in and out.
• Several files can be selected by keeping the Ctrl key
down
• Right clicking on the data file allows you to
oSave an individual result file (”Export”)
oDelete
oLink to another data file
oSave workflow
![Page 15: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/15.jpg)
Workflow – reusing and sharing your analysis pipeline
• You can save your analysis steps as a reusable automatic ”macro”
oall the analysis steps and their parameters are saved as a script file
oyou can apply workflow to another dataset
oyou can share workflows with other users
![Page 16: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/16.jpg)
Saving and using workflows
• Select the starting point for your workflow
• Select ”Workflow / Save starting from selected”
• Save the workflow file on your computer with a
meaningful name
oDon’t change the ending (.bsh)!
• To run a workflow, select
oWorkflow -> Open and run
oWorkflow -> Run recent (if you saved the workflow recently).
![Page 17: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/17.jpg)
Analysis tool overview
• 200 NGS tools for
RNA-seq
single cell RNA-seq
small RNA-seq
16S rRNA amplicon seq
exome/genome-seq
ChIP-seq
FAIRE/DNase-seq
CNA-seq
• 140 microarray tools for
gene expression
miRNA expression
protein expression
aCGH
SNP
integration of different data
• 60 tools for sequence analysis
BLAST, EMBOSS, MAFFT
Phylip
![Page 18: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/18.jpg)
Visualizing the data
• Data visualization panel
o Maximize and redraw for better viewing
o Detach = open in a separate window, allows you to view several images at the same time
• Two types of visualizations
1. Interactive visualizations produced by the client program
o Select the visualization method from the pulldown menu
o Save by right clicking on the image
2. Static images produced by analysis tools
o Select from Analysis tools / Visualisation
o View by double clicking on the image file
o Save by right clicking on the file name and choosing ”Export”
![Page 19: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/19.jpg)
Interactive visualizations by the client
• Genome browser
• Spreadsheet
• Histogram
• Venn diagram
• Scatterplot
• 3D scatterplot
• Volcano plot
• Expression profiles
• Clustered profiles
• Hierarchical clustering
• SOM clustering
Available actions:
• Select genes and create a gene list
• Change titles, colors etc
• Zoom in/out
![Page 20: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/20.jpg)
Static images produced by R/Bioconductor
• Dispersion plot
• Heatmap
• tSNE plot
• Violin plot
• PCA plot
• MA plot
• MDS plot
• Box plot
• Histogram
• Dendrogram
• K-means clustering
• Etc…
![Page 21: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/21.jpg)
Options for importing data to Chipster
• Import files/ Import folder
• Import from URLoUtilities / Download file from URL directly to server
• Import from BaseSpace
• Open an analysis sessionoFiles / Open session
• Import from SRA databaseoUtilities / Retrieve FASTQ or BAM files from SRA
• Import from Ensembl databaseoUtilities / Retrieve data for a given organism in Ensembl
• What kind of data files can I use in Chipster?oCompressed files (.gz) and tar packages (.tar) are ok
oFASTQ, BAM, read count files (.tsv)
![Page 22: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/22.jpg)
How to import a large tar package of files from a server and use only some of the files?
• Import the tar package by selecting File / Import from / URL directly to server
• Check what files it contains using the tool Utilities / List contents of a tar file
• Selectively extract the files you want with Utilities / Extract .tar or .tar.gz file
8.10.201922
![Page 23: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/23.jpg)
Problems? Send us a support request -request includes the error message and link to analysis session (optional)
![Page 24: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/24.jpg)
New web based Chipster available for testing!
![Page 25: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/25.jpg)
How to get a user account?
• Finnish academics:
oCOMING SOON: myCSCportal (https://my.csc.fi/welcome)
ouse Scientist’s User Interface service (https://sui.csc.fi) with HAKA credentials
o click on the purple HAKA link
o log in with your HAKA username and password
o fill in the sign up form as shown in https://research.csc.fi/csc-guide-getting-access-to-csc-services#1.2.1
o If you don’t have HAKA credentials, email [email protected]
oUsers who already have a regular CSC username and password can use those for Chipster
• Others:
oSet up a local Chipster server at your institute (free of charge)
oBuy access to CSC’s Chipster server (500 euros / person /year)
oUse EGI’s Chipster server (free of charge)
• See https://chipster.csc.fi/access.shtml for details
![Page 26: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/26.jpg)
More info
• http://chipster.csc.fi
• Chipster tutorials in YouTube
• https://chipster.csc.fi/manual/courses.html
![Page 27: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/27.jpg)
Acknowledgements to Chipster users and contributors
![Page 28: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/28.jpg)
Introduction to single cell RNA-seq
![Page 29: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/29.jpg)
Single cell RNA-seq
• New technology, data analysis methods are being actively developed
• Allows to find cell types and cell-specific changes in transcriptome
8.10.201929
![Page 30: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/30.jpg)
Different technologies for capturing single cell transcriptomes
• Plate-based: SMART-Seq2, STRT-Seq, CEL-Seq
• Droplet-based: 10X Chromium, Drop-Seq, Indrop, Fluidigm C1
• Microwell-based: ICell8, Seq-Well
• Libraries are usually 3’ tagged: only a short sequence at the 3’ end of the
mRNA is sequenced
Slide by Heli Pessa
![Page 31: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/31.jpg)
Different technologies for capturing single-cell transcriptomes
10X
Encapsulation of cells RNA capture
Barcoded single-cell transcriptomes
Cells
OilBarcoded RNA capture beads
Library prep and sequencing
Indrop
Drop-Seq
Seq-Well
Slide by Heli Pessa
![Page 32: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/32.jpg)
RNA capture
• Barcoded RNA capture beads
• Capture oligo preparation
• Ligation of short oligonucleotides (10X and Indrop): known set of barcodes
• Split-and-pool synthesis (Drop-Seq): random barcodes
Slide by Heli Pessa
UMI
Linker-5’-TTTTTTTAAGCAGTGGTATCAACGCAGAGTACJJJJJJJJJJJJNNNNNNNNTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT3 ’
3’AAAAAAAAAAAAAAAAAAAAAAAAAANNNNNNNNNN…5'
PCR primer site Cell barcode
mRNA
polyT
Capture oligo
![Page 33: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/33.jpg)
cDNA synthesisRNA capture
Reverse transcription
Template switch
cDNA
Slide by Heli Pessa
Bead oligo 5’TTTTTTTTTTTTTTTTTTTTT NNNNNNNNNNNNNNNNNCCC3’
mRNA 3’AAAAAAAAAAAAAAAAAAAAANNNNNNNNNNNNNNNNN5’
cDNA
UMI
Linker-5’-TTTTTTTAAGCAGTGGTATCAACGCAGAGTACJJJJJJJJJJJJNNNNNNNNTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT3 ’
3’AAAAAAAAAAAAAAAAAAAAAAAAAANNNNNNNNNN…5'
PCR primer site Cell barcode
mRNA
Bead oligo 5’TTTTTTTTTTTTTTTTTTTTT NNNNNNNNNNNNNNNNN CCCATTCACTCTGCGTTGATACCACTGCTT 3’
mRNA 3’AAAAAAAAAAAAAAAAAAAAANNNNNNNNNNNNNNNNN GGGTAAGTGAGACGCAACTATGGTGACGAA 5’
Linker-TTTTTTTAAGCAGTGGTATCAACGCAGAGTACJJJJJJJJJJJJNNNNNNNNTTTTTTTTTTTNNNNNNNNNNNNNNNNNCCCATTC ACTCTGCGTTGATACCACTGCTT
PCR primer site
PCR primer site
![Page 34: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/34.jpg)
Single cell library preparation
• Preamplification of full length cDNA
• Sequencing adapter ligation or tagmentation
• Library amplification plenty of PCR copies!
• Paired-end sequencing
Read1: cell barcode (CB) and molecular barcode (UMI)
Read2: cDNA sequence
Slide by Heli Pessa
JJJJJJJJJJJJNNNNNNNNTTTTTTTTTTTTTTTTNNNNNNNNNNNNNNNNNNNNNNNNNNN
Read 1: Cell barcode & UMI Read 2: gene sequence
![Page 35: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/35.jpg)
Data preprocessing overview
8.10.201937
![Page 36: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/36.jpg)
What can go wrong?
1. Ideally there is one healthy cell in the droplet. However, sometimes
oThere is no cell in the droplet, just ambient RNA
Detect “empties” based on the small number of genes expressed and remove
oThere are two (or more) cells in a droplet
Detect duplets (and multiplets) based on the large number of genes expressed and remove
o The cell in the droplet is broken/dead
Detect based on high proportion of reads mapping to mitochondrial genome and remove
2. Sometimes barcodes have synthesis errors in them, e.g. one base is missing
Detect by checking the distribution of bases at each position and fix the code or remove the cell
8.10.201938
![Page 37: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/37.jpg)
Single cell RNA-seq data analysis
![Page 38: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/38.jpg)
Single cell RNA-seq data is challenging
• High amount of dropouts
oA gene is expressed but the expression is not detected due to technical limitations the detected expression level for many genes is zero
• Data is noisy. High level of variation between the cells due to
oCapture efficiency (percentage of mRNAs captured)
oReverse transcription efficiency
oAmplification bias (non-uniform amplification of transcripts)
oCell-cycle stage and size
oSignificant differences in sequencing depth (number of UMIs/cell)
• Complex distribution of expression values
oCell heterogeneity and the abundance of zeros give rise to multimodal distributions
Many methods used for bulk RNA-seq data won’t work
8.10.201940
![Page 39: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/39.jpg)
scRNA-seq data analysis steps for producing a digital gene expression matrix (DGE)
8.10.201941
QC of raw reads
Preprocessing
Alignment
Annotation
DGE matrix
• Check the quality of sequencing reads
• Trim if necessary
• Align to reference genome
• Match the alignment positions to known locations of genes
• Count reads/UMIs for each gene
• Store UMI counts in a table
• And now we are ready for the interesting stuff…
![Page 40: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/40.jpg)
Finding clusters of cells and their marker genes
![Page 41: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/41.jpg)
Analysis steps for clustering
1. Check the quality of cells
2. Filter out low quality cells
3. Normalize expression values
4. Identify highly variable genes
5. Scale data, regress out unwanted variation
6. Reduce dimensions using principal component analysis (PCA) on the variable genes
7. Determine significant principal components (PCs)
8. Use the PCs to cluster cells with graph based clustering
9. Use the PCs to visualize clusters with non-linear dimensional reduction (tSNE or UMAP)
10. Detect and visualize marker genes for the clusters8.10.201943
![Page 42: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/42.jpg)
Seurat
• One the most popular R packages for single cell RNA-seq data
analysis
• Provides tools for all the steps mentioned in the previous slide
oAlso tools for integrative analysis
• Note: Chipster contains currently both Seurat v2 and v3
oThe data structure changed: v2 data won’t work with v3 tools and vice versa
oDifferences in naming of variables
• http://satijalab.org/seurat
8.10.201944
![Page 43: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/43.jpg)
Analysis steps for clustering
1. Check the quality of cells
2. Filter out low quality cells
3. Normalize expression values
4. Identify highly variable genes
5. Scale data, regress out unwanted variation
6. Reduce dimensions using principal component analysis (PCA) on the variable genes
7. Determine significant principal components (PCs)
8. Use the PCs to cluster cells with graph based clustering
9. Use the PCs to visualize clusters with non-linear dimensional reduction (tSNE or UMAP)
10. Detect and visualize marker genes for the clusters8.10.201945
![Page 44: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/44.jpg)
Setting up a Seurat object, filtering genes
• Give a name for the project (used in some plots)
• Sample or group name
o if you have two samples that you wish to compare
• Filtering genes
oKeep genes which are expressed (detected) in at least X cells
• Input
oAssign correctly!
8.10.201946
![Page 45: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/45.jpg)
Two input file options
1. 10X Genomics files
othree files needed: barcodes.tsv, genes.tsv and matrix.mtx -NOTE: files need to have these names!
omake a tar package of these files and import it to Chipster
2. DGE matrix from the DropSeq tools
oDGE matrix made in Chipster, or import a ready-made DGE matrix (.tsv file)
• Check that the input file is correctly assigned!
8.10.201947
![Page 46: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/46.jpg)
What do those 10X files contain?
1. matrix.mtx
o Number of UMIs for a given gene in a given cell
o Make sure that you use the filtered feature barcode matrix (contains only those cell barcodes which are present in your data)
o Sparse matrix (only non-zero entries are stored), in MEX format
o Header: third line tells how many genes and cells you have
o Each row: gene index, cell index, number of UMIs
2. barcodes.tsv
oCell barcodes present in your data
3. genes.tsv (features.tsv)
o Identifier, name and type (gene expression)
8.10.201948
![Page 47: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/47.jpg)
Output files
• R object (Robj) that Seurat-based tools use to store data
oContains specific slots for different types of data, you use this file as input for the next analysis tool
oYou cannot view the contents of Robj in Chipster (you can import it to R)
• Pdf file with quality control plots and number of cells
oNote that the vocabulary changed between Seurat v2 and v3:
oNumber of genes in a cell, nGene nFeature_RNA
oNumber of transcripts in a cell, nUMI nCount_RNA
o Percentage of mitochondrial transcripts, percent.mito percent.mt
8.10.201949
![Page 48: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/48.jpg)
How to detect empties, multiplets and broken cells?
• Empties = no cell in droplet: low gene count (nFeature_RNA < 200)
• Multiplet = more than one cell in droplet: large gene count (nFeature_RNA > 2500)
• Broken/dead cell in droplet: lot of mitochondrial transcripts (percent.mt > 5%)
8.10.201950
nGene / nFeature_RNA: # genes in a cell
nUMI / nCount_RNA: # transcripts in a cell
percent.mito / percent.mt: % of mitochondrial transcripts
multiplets
empties
Dead/broken cells
![Page 49: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/49.jpg)
Scatter plots for quality control
• nCount / nUMI vs percent.mito: are there cells with low number of transcripts and high mito%
• nCount / UMI vs nFeature / nGene: these should correlate
8.10.201951
nGene / nFeature_RNA: # genes in a cell
nUMI / nCount_RNA: # transcripts in a cell
percent.mito / percent.mt: % of mitochondrial transcripts
![Page 50: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/50.jpg)
Analysis steps for clustering
1. Check the quality of cells
2. Filter out low quality cells
3. Normalize expression values
4. Identify highly variable genes
5. Scale data, regress out unwanted variation
6. Reduce dimensions using principal component analysis (PCA) on the variable genes
7. Determine significant principal components (PCs)
8. Use the PCs to cluster cells with graph based clustering
9. Use the PCs to visualize clusters with non-linear dimensional reduction (tSNE or UMAP)
10. Detect and visualize marker genes for the clusters8.10.201952
![Page 51: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/51.jpg)
Filter, normalize, regress and detect variable genes (v3)
8.10.201953
![Page 52: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/52.jpg)
Normalizing scRNA-seq gene expression values
• Gene expression values contain a lot of noise caused by
oLow mRNA content per cell
oVariable mRNA capture
oVariable sequencing depth
• Variance of gene expression values should reflect biological variation across cells
oWe need to remove non-biological variation
• Normalization methods for bulk RNA-seq data don’t work well for single cell data which
contains a lot of zeros (because of drop-outs)
8.10.201955
![Page 53: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/53.jpg)
Global scaling normalization
• Divide gene’s expression value in a cell by the total number of transcripts in that cell
• Multiply the ratio by a scale factor (10,000 by default)
oThis scales each cell to this total number of transcripts
• Log-transform the result
8.10.201956
![Page 54: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/54.jpg)
Analysis steps for clustering
1. Check the quality of cells
2. Filter out low quality cells
3. Normalize expression values
4. Identify highly variable genes
5. Scale data, regress out unwanted variation
6. Reduce dimensions using principal component analysis (PCA) on the variable genes
7. Determine significant principal components (PCs)
8. Use the PCs to cluster cells with graph based clustering
9. Use the PCs to visualize clusters with non-linear dimensional reduction (tSNE or UMAP)
10. Detect and visualize marker genes for the clusters8.10.201958
![Page 55: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/55.jpg)
Selecting highly variable genes
• In order to perform clustering, we need to find genes whose expression varies across cells
• BUT scRNA-seq data has strong mean-variance relationship
o low expressing genes have higher variance
• We cannot select genes based on their variance variance needs to be stabilized first
8.10.201959
![Page 56: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/56.jpg)
Variance stabilizing transformation (VST)
• Compute the mean and variance for each gene using the unnormalized UMI counts, and
take log10 of both
• Fit a curve to predict the variance of each gene as a function of its mean
• Standardized expression = (expressiongeneXcellY – mean expressiongeneX) / predicted SDgeneX
oclip the standardized values to a max sqrt(total number of cells) to reduce the impact of technical outliers
• For each gene, compute the variance of the standardized values across all cells
We rank the genes based on their standardized variance and use the top 2000 genes for
PCA and clustering
8.10.201960
![Page 57: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/57.jpg)
Detection of highly variable genes: plots
8.10.201961
<- top10 most highly variable genes
![Page 58: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/58.jpg)
Analysis steps for clustering
1. Check the quality of cells
2. Filter out low quality cells
3. Normalize expression values
4. Identify highly variable genes
5. Scale data, regress out unwanted variation
6. Reduce dimensions using principal component analysis (PCA) on the variable genes
7. Determine significant principal components (PCs)
8. Use the PCs to cluster cells with graph based clustering
9. Use the PCs to visualize clusters with non-linear dimensional reduction (tSNE or UMAP)
10. Detect and visualize marker genes for the clusters8.10.201963
![Page 59: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/59.jpg)
Scaling expression values prior to dimensional reduction
• Standardize expression values for each gene across all cells prior to PCA
oThis gives equal weight in downstream analyses, so that highly expressed genes do not dominate
• Z-score normalization in Seurat’s ScaleData function
oShifts the expression of each gene, so that the mean expression across cells is 0
oScales the expression of each gene, so that the variance across cells is 1
• ScaleData has an option to regress out unwanted sources of variation
oE.g. cells might cluster according to their cell cycle state rather than cell type
8.10.201964
![Page 60: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/60.jpg)
Regress out unwanted sources of variation
• Several sources of uninteresting variation
otechnical noise
obatch effects
ocell cycle stage, etc
• Removing this variation improves downstream analysis
• Seurat constructs linear models to predict gene expression based on user-defined variables
onumber of detected molecules per cell, mitochondrial transcript percentage, batch, cell alignment rate,…
oSeurat regresses the given variables individually against each gene, and the resulting residuals are scaled
oscaled z-scored residuals of these models are used for dimensionality reduction and clustering
o In Chipster the following effects are removed:
o number of detected molecules per cell
omitochondrial transcript percentage
o cell cycle stage (optional)
8.10.201965
![Page 61: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/61.jpg)
Mitigating the effects of cell cycle heterogeneity
1. Compute cell cycle phase scores for each cell based on its expression of G2/M and S phase marker genes
1. These markers are well conserved across tissues and species
2. Cells which do not express markers are considered not cycling, G1
2. Model each gene’s relationship between expression and the cell cycle score
3. Two options to regress out the variation caused by different cell cycle stages
1. Remove ALL signals associated with cell cycle stage
2. Remove the DIFFERENCE between the G2M and S phase scores. Thispreserves signals for non-cycling vs cycling cells, only the difference in cell cycle phase amongst the dividing cells are removed. Recommended when studying differentiation processes
8.10.201966
![Page 62: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/62.jpg)
Regressing out the variation caused by different cell cycle stages
PCA on cell cycle genes (dot = cell, colors = phases)
8.10.201967
![Page 63: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/63.jpg)
Sctransform – new approach to normalization etc
1. Check the quality of cells
2. Filter out low quality cells
3. Normalize expression values
4. Identify highly variable genes
5. Scale data, regress out unwanted variation
6. Reduce dimensions using principal component analysis (PCA) on the variable genes
7. Determine significant principal components (PCs)
8. Use the PCs to cluster cells with graph based clustering
9. Use the PCs to visualize clusters with non-linear dimensional reduction (tSNE or UMAP)
10. Detect and visualize marker genes for the clusters8.10.201968
sctransform
![Page 64: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/64.jpg)
Sctransform: modeling framework for normalization and variance stabilization
8.10.201969
• Sequencing depth (number of UMIs per cell) varies significantly between cells
• Normalized expression values of a gene should be independent of sequencing depth
• The default log normalization works ok only for low to medium expressing genes
oThe normalized expression values of high expressing genes correlate with sequencing depth
oCells with low sequencing depth show disproportionally high variance for highly expressed genes
• Sctranform models gene expression as a function of sequencing depth using GLM
oConstrains the model parameters through regularization, by pooling information across genes which are expressed at similar levels
oNormalized expression values = Pearson residuals from regularized negative binomial regression
o Positive residual for a given gene in a given cell indicate that we observed more UMIs than expected given the gene’s average expression in the population and and the cellular sequencing depth
o Pearson residual = response residual devided by the expected standard deviation (effectively VST)
![Page 65: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/65.jpg)
Normalization using Pearson residuals works best
8.10.201970
Hafemeister (2019): Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression
![Page 66: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/66.jpg)
Analysis steps for clustering
1. Check the quality of cells
2. Filter out low quality cells
3. Normalize expression values
4. Identify highly variable genes
5. Scale data, regress out unwanted variation
6. Reduce dimensions using principal component analysis (PCA) on the variable genes
7. Determine significant principal components (PCs)
8. Use the PCs to cluster cells with graph based clustering
9. Use the PCs to visualize clusters with non-linear dimensional reduction (tSNE or UMAP)
10. Detect and visualize marker genes for the clusters8.10.201971
![Page 67: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/67.jpg)
Dimensionality reduction
8.10.201972
• Removes redundancies in the data
oCells are characterized by the expression values of all the genes thousands of dimensions
oThe expression of many genes is correlated, we don’t need so many dimensions to distinguish cell types
• Identifies the most relevant information in order to cluster cells
oOvercomes the extensive technical noise in scRNA-seq data
• Simplifies complexity so that the data becomes easier to work with
oWe have thousands of genes and cells
• Can be linear (e.g. PCA) or non-linear (e.g. tSNE, UMAP)
![Page 68: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/68.jpg)
Principal Component Analysis (PCA)
8.10.201973
• Finds principal components (PCs) of the data
oDirections where the data is most spread out = where there is most variance
oPC1 explains most of the variance in the data, then PC2, PC3, …
• We will select the most important PCs and use them for clustering cells
o Instead of 20 000 genes we have now maybe 10 PCs
oEssentially each PC represents a robust ‘metagene’ that combines information across a correlated gene set
• Prior to PCA we scale the data so that genes have equal weight in downstream
analysis and highly expressed genes don’t dominate
oShift the expression of every gene so that the mean expression across cells is 0 and the variance across cells is 1.
![Page 69: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/69.jpg)
8.10.201974 Slide by Paulo Czarnewski
![Page 70: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/70.jpg)
Visualizing PCA results: loadings
8.10.201975
• Visualize top genes associated with principal components
o= Which genes are important for PC1?
• Is the correlation positive or negative?
![Page 71: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/71.jpg)
Visualizing PCA results: heatmap
8.10.201976
• Which genes correspond to separating cells?
oCheck if there are cell cycle genes
• Both cells and genes are ordered according to
their PCA scores. Plots the extreme cells on both
ends of the spectrum
![Page 72: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/72.jpg)
Visualizing PCA results: PCA plot
8.10.201977
• Gene expression patterns will be
captured by PCs PCA can separate cell
types
• Note that PCA can also capture other
things, like sequencing depth!
![Page 73: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/73.jpg)
Determine the significant principal components
• It is important to select the significant PCs for clustering analysis
• However, estimating the true dimensionality of a dataset is challenging
• Seurat developers:
oTry repeating downstream analyses with a different number of PCs (10, 15, or even 50!).
o The results often do not differ dramatically.
oRather choose higher number.
o For example, choosing 5 PCs does significantly and adversely affect results
• Chipster provides the following plots to guide you selecting the significant PCs:
oElbow plot
oPC heatmaps
8.10.201978
![Page 74: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/74.jpg)
How much variance each PC explains?
Gaps?Plateau?
Elbow plot
• The elbow in the plot tends to reflect a transition from informative PCs to those that
explain comparatively little variance.
8.10.201979
![Page 75: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/75.jpg)
Principal component heatmaps
• Check if there is still a difference
between the extremes
• Exclude also PCs that are driven
primarily by uninteresting genes (cell
cycle, ribosomal or mitochondrial)
8.10.201980
![Page 76: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/76.jpg)
Analysis steps for clustering
1. Check the quality of cells
2. Filter out low quality cells
3. Normalize expression values
4. Identify highly variable genes
5. Scale data, regress out unwanted variation
6. Reduce dimensions using principal component analysis (PCA) on the variable genes
7. Determine significant principal components (PCs)
8. Use the PCs to cluster cells with graph based clustering
9. Use the PCs to visualize clusters with non-linear dimensional reduction (tSNE or UMAP)
10. Detect and visualize marker genes for the clusters8.10.201981
![Page 77: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/77.jpg)
Clustering
• Divides cells into distinct groups based on gene expression
• Our data is big and complex (lot of cells, genes and noise), so we use principal
components instead of genes. We also need a clustering method that can cope
with this.
Graph-based clustering
Shared nearest neighbor approach
Graph cuts by Louvain method
8.10.201982
![Page 78: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/78.jpg)
1. Identify k nearest neighbours of each cell
oEuclidean distance in PC space
2. Rank the neighbours based on distance
3. Build the graph: add an edge between cells if
they have a shared nearest neigbour (SNN)
oGive edge weights based on ranking
4. Cut the graph to subgraphs (clusters) by
optimizing modularity
oLouvain algorithm by default
8.10.201983
Graph based clustering in Seurat
3. Graph (SNN)
1. KNN
k=3(30 in reality)
![Page 79: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/79.jpg)
Clustering parameters
• Number of principal components to use
• Experiment with different values
• If you are not sure, use a higher number
• Resolution for granularity
• Increasing the value leads to more clusters
• Values 0.4 - 1.2 typically return good results for single cell datasets of around 3000 cells
• Higher resolution is often optimal for larger datasets
8.10.201985
![Page 80: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/80.jpg)
tSNE plot for cluster visualization
8.10.201986
• t-distributed Stochastic Neighbor Embedding
• Non-linear dimensional reduction
oDifferent transformations to different regions
• Specialized in local embedding
oDistance between clusters is not meaningful
ohttps://distill.pub/2016/misread-tsne/
• Perplexity = number of neighbors to consider
oDefault 30, lower for small datasets
• Input data: same PCs as for clustering
• tSNE plot is gray by default, we color it by clustering results from the previous step
oCheck how well the groupings found by tSNE match with cluster colors
![Page 81: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/81.jpg)
UMAP plot for cluster visualization
• UMAP = Uniform Manifold Approximation and
Projection
• Non-linear graph-based dimension reduction
method like tSNE
oPreserves more of the global structure than tSNE
• Input data: same PCs as for clustering
• Plot is gray by default, we color it by clustering
oCheck how well the groupings found by UMAP match with cluster colors
• More info: video 6 at bit.ly/scRNA-seq
oDimensionality reduction explained by Paulo Czarnewski
8.10.201987
![Page 82: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/82.jpg)
Analysis steps for clustering
1. Check the quality of cells
2. Filter out low quality cells
3. Normalize expression values
4. Identify highly variable genes
5. Scale data, regress out unwanted variation
6. Reduce dimensions using principal component analysis (PCA) on the variable genes
7. Determine significant principal components (PCs)
8. Use the PCs to cluster cells with graph based clustering
9. Use the PCs to visualize clusters with non-linear dimensional reduction (tSNE or UMAP)
10. Detect and visualize marker genes for the clusters8.10.201988
![Page 83: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/83.jpg)
Marker gene for a cluster
• Differentially expressed between the cluster and all other cells
8.10.201989
![Page 84: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/84.jpg)
Differential expression analysis of scRNA-seq data
• Challenging because the data is noisy
o low amount of mRNA low counts, high dropout rate, amplification biases
ouneven sequencing depth
• Non-parametric tests, e.g. Wilcoxon rank sum test (Mann-Whitney U test)
oCan fail in the presence of many tied values, such as the case for dropouts (zeros) in scRNA-seq
• Methods specific for scRNA-seq, e.g. MAST
oTake advantage of the large number of samples (cells) for each group
oMAST accounts for stochastic dropouts and bimodal expression distribution
• Methods for bulk RNA-seq, e.g. DESeq2
oBased on negative binomial distribution, works ok for UMI data.
oNote: you should not filter genes, because DESeq2 models dispersion by borrowing information from other genes with similar expression level
8.10.201990
![Page 85: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/85.jpg)
8.10.201991Slide by Bishwa Ghimire
![Page 86: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/86.jpg)
Filtering out genes prior to statistical testing – why?
• We test thousands of genes, so it is possible that we get good p-values just by chance
(false positives)
Multiple testing correction of p-values is needed
oThe amount of correction depends on the number of tests (= genes)
oBonferroni correction: adjusted p-value = raw p-value * number of genes tested
o If we test less genes, the correction is less harsh better p-values
• Filtering also speeds up testing
8.10.201992
![Page 87: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/87.jpg)
Gene filtering parameters
• Min fraction: test only genes which are expressed in at least this percentage of cells in
either of the two groups (default 25%)
• Differential expression threshold: test only genes whose log2 fold change of average
expression between the two groups is at least this much (default 25%)
8.10.201993
![Page 88: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/88.jpg)
Marker gene result table
• p_val = p-value
• p_val_adj = p-value adjusted using the Bonferroni method
• avg_logFC = log2 fold change between the groups
• cluster = cluster number
• pct1 = percentage of cells where the gene is detected in the first group
8.10.201994
![Page 89: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/89.jpg)
How to retrieve marker genes for a particular cluster?
• The result table contains marker genes for all the clusters
• You can filter the result table based on the cluster column using the tool
Utilities / Filter table by column value using the following parameters
8.10.201995
![Page 90: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/90.jpg)
Visualize cluster marker genes
• Color dimension reduction plot with marker
gene expression
• Violin plot
8.10.201997
![Page 91: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/91.jpg)
Integrated analysis of two samples
8.10.201998
![Page 92: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/92.jpg)
Goals of integrated analysis
• When comparing two samples, e.g. control and treatment, we want to
o Identify cell types that are present in both samples
oObtain cell type markers that are conserved in both control and treated cells
oFind cell-type specific responses to treatment
8.10.201999
![Page 93: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/93.jpg)
When comparing two samples we need to correct for batch effects
• We need to find corresponding cells in the two samples
oTechnical and biological variability can cause batch effects which make this difficult
• Several batch effect correction methods for single cell RNA-seq data available, e.g.
oSeurat v2: Canonical correlation analysis (CCA) + dynamic time warping
oSeurat v3: CCA + anchors
oMutual nearest neigbors (MNN)
8.10.2019100
![Page 94: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/94.jpg)
Analysis steps for integrated analysis
1. Check the quality of cells, filter out low quality cells
2. Normalize expression values
3. Identify highly variable genes
4. Integrate samples and perform CCA, align samples
5. Scale data, perform PCA
6. Cluster cells, visualize clusters with tSNE or UMAP
7. Find conserved biomarkers for clusters
8. Find differentially expressed genes between samples, within clusters
9. Visualize interesting genes
8.10.2019101
![Page 95: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/95.jpg)
Integrated analysis: Setup, QC, filtering
• Perform the Seurat object setup, QC and filtering steps separately for the two samples
oSame as before, just remember to name the samples, e.g. CTRL and STIM
8.10.2019102
![Page 96: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/96.jpg)
Analysis steps for integrated analysis
1. Check the quality of cells, filter out low quality cells
2. Normalize expression values
3. Identify highly variable genes
4. Integrate samples and perform CCA, align samples
5. Scale data, perform PCA
6. Cluster cells, visualize clusters with tSNE or UMAP
7. Find conserved biomarkers for clusters
8. Find differentially expressed genes between samples, within clusters
9. Visualize interesting genes
8.10.2019104
![Page 97: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/97.jpg)
Canonical correlation analysis (CCA)
• Dimension reduction, like PCA
• Captures common sources of variation
between two datasets
• Produces canonical correlation vectors, CCs
oEffectively capture correlated gene modules that are present in both datasets
oRepresent genes that define a share biological space
• Why not PCA?
o It identifies the sources of variation, even if present only in 1 sample (e.g. technical variation)
oWe want to integrate, so we want to find the similarities
8.10.2019105
![Page 98: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/98.jpg)
Aligning two samples (Seurat v3)
1. Canonical correlation
analysis + L2-
normalisation of CCVs
shared space
2. Identify pairs of mutual
nearest neighbors
(MNN) “anchors”
3. Score anchors (based on
neighborhood, in PC
space)
4. Anchors + scores
correction vectors
8.10.2019106
Anchors = cells in a shared biological
state
![Page 99: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/99.jpg)
Combine two samples –tools (v3)
1. Identify “anchors” for data integration
o Parameter: how many CCs to use in the neighbor search [20]
2. Integrate datasets together
o Parameter: how many PCs to use in the anchor weighting procedure [20]
8.10.2019108
![Page 100: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/100.jpg)
Dimensionality –how many CCs / PCs to choose for downstream analysis?
• In the article* by Seurat developers, they “neglect to finely tune this parameter for
each dataset, but still observe robust performance over diverse use cases”.
oFor all neuronal, bipolar, and pancreatic analyses: dimensionality of 30.
oFor scATAC-seq analyses: 20.
oFor analyses of human bone marrow: 50
oThe integration of mouse cell atlases: 100
• Higher numbers: for significantly larger dataset and increased heterogeneity
8.10.2019109* https://www.biorxiv.org/content/biorxiv/early/2018/11/02/460147.full.pdf
![Page 101: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/101.jpg)
Integrated analysis of two samples –tools (v3)
1. Cluster cells
o As before
2. Visualize clustering
o tSNE or UMAP, as a parameter
8.10.2019110
![Page 102: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/102.jpg)
Analysis steps for integrated analysis
1. Check the quality of cells, filter out low quality cells
2. Normalize expression values
3. Identify highly variable genes
4. Integrate samples and perform CCA, align samples
5. Scale data, perform PCA
6. Cluster cells, visualize clusters with tSNE or UMAP
7. Find conserved biomarkers for clusters
8. Find differentially expressed genes between samples, within clusters
9. Visualize interesting genes
8.10.2019112
![Page 103: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/103.jpg)
Find conserved cluster marker genes in two samples
• Conserved marker gene = marker for a given cluster in both samples
oGive cluster as a parameter
oCompares gene expression in cluster X vs all other cells
• Uses Wilcoxon rank sum test
• Result table contains max_pval (= bigger one of the raw p-values for the two samples)
oWe are interested in adjusted p-values (p_val_adj). Use the tool Utilities / Filter table by column valueto filter based on these
8.10.2019113
![Page 104: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/104.jpg)
Find cell-type specific differentially expressed genes between samples
• We are now looking for differential expression between two samples in one cluster
• Uses Wilcoxon rank sum test
• You can filter the result table with the tool Utilities / Filter table by column value to
find significantly differentially expressed genes (adjusted p_val_adj < 0.05)
8.10.2019114
![Page 105: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/105.jpg)
Analysis steps for integrated analysis
1. Check the quality of cells, filter out low quality cells
2. Normalize expression values
3. Identify highly variable genes
4. Integrate samples and perform CCA, align samples
5. Scale data, perform PCA
6. Cluster cells, visualize clusters with tSNE or UMAP
7. Find conserved biomarkers for clusters
8. Find differentially expressed genes between samples, within clusters
9. Visualize interesting genes
8.10.2019115
![Page 106: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/106.jpg)
Visualize interesting genes in split dot plot
• Size = the percentage of cells in a cluster expressing a given gene
• Brightness = the average expression level in the expressing cells in a cluster
8.10.2019116
![Page 107: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/107.jpg)
Visualize interesting genes in tSNE/UMAP plots
1. No change between the samples: conserved cell type markers
2. Change in all clusters: cell type independent marker for the treatment
3. Change in one/some clusters: cell type dependent behavior to the treatment
8.10.2019117
1.
2.
3.
CTRL TREATED
Different marker genes
![Page 108: Single-cell RNA-seq data analysis using Chipster - CSC · CSC –Suomalainen tutkimuksen, koulutuksen, kulttuurin ja julkishallinnon ICT-osaamiskeskus Single-cell RNA-seq data analysis](https://reader034.vdocuments.site/reader034/viewer/2022042208/5eab883c824d050c7938dd45/html5/thumbnails/108.jpg)
Visualize interesting genes in violin plots
8.10.2019118
1. No change between the samples:
conserved cell type markers
2. Change in all clusters: cell type
independent marker for the
treatment
3. Change in one/some clusters: cell
type dependent behavior to the
treatment