next generation sequencing activities - polito
TRANSCRIPT
POLITECNICO DI TORINO
Next Generation Sequencing Activities
Meeting Politecnico of Turin-Ebri Fundation PACIELLO Giulia on Behalf of
2 July 2013 FICARRA Elisa
Department of Control and Computer Engineering
Politecnico of Turin, Italy
POLITECNICO DI TORINO
Department of Control and Computer Engineering (DAUIN)
RNA-seq Workshop, 27-28 March 2012 Francesco Abate - Politecnico di Torino 1
Point of reference in Politecnico di Torino for the area of Information and Communication Technologies (ICT);
Promotes and manages basic and applied research, training, technology transfer and services in the areas of systems and control engineering, computer science and computer engineering and operations research;
14 research laboratories, more than 60 researchers, about 100 PhD students and research collaborators.
POLITECNICO DI TORINO
EDA Group
RNA-seq Workshop, 27-28 March 2012 Francesco Abate - Politecnico di Torino 2
Three main research areas: Computer-aided design of digital electronic circuits and systems, with particular
emphasis on methodologies, algorithms and tools for power estimation and optimization of systems;
Smart city and Smart systems, with particular emphasis on wireless sensor and actuator network for environment monitoring and control and middleware for network interoperability;
Bioinformatics (BIO-EDA group), with special emphasis on algorithms and tools for computational biology, next generation sequencing (NGS), molecular dynamics, biomedical signal and image processing and genetic network implementation.
EDA (Electronic Design Automation) Group: 2 full professors 1 associate professor 3 assistant professors 6 Post-Doctoral researchers 8 PhD students 5 research assistants 3 secretaries
CONTACTS Politecnico di Torino, DAUIN, Corso Duca Degli Abruzzi, 24 10129, Torino, Italy Tel. +39 011 090 7042 Fax. +39 011 090 7099 Email (secretariat): [email protected]
POLITECNICO DI TORINO
NGS Team
RNA-seq Workshop, 27-28 March 2012 Francesco Abate - Politecnico di Torino 3
External Collaborations
• Raul Rabadan (Department of Biomedical Informatics, Columbia University, USA)
• Alberto Ferrarini, Massimo Delledonne (Department of Biotechnology, University of Verona, ITALY)
• Ilaria Iacobucci , Simona Soverini, Giovanni Martinelli (Department of Medical Oncology and
Hematology “L. e A. Seràgnoli”, University of Bologna, ITALY)
• Roberto Piva, Giorgio Inghirami (CERMS, Torino, ITALY)
• Alberto Zamò (Department of Pathology and Diagnostics, University of Verona, ITALY)
• Enzo Medico, Claudio Isella, Consalvo Petti (IRCC, Candiolo, ITALY)
• Raffaele Calogero (MBC, University of Torino, ITALY)
NGS Team@POLITO
• Andrea Acquaviva, Elisa Ficarra, Francesco Abate, Giulia Paciello, Gaspare Scherma, Gianvito
Urgese
POLITECNICO DI TORINO
5 RNA-seq Workshop, 27-28 March 2012 Francesco Abate - Politecnico di Torino
Biological Overview Fusion transcripts are chimeric RNA that can be encoded by: FUSION GENES TRANS SPLICING EVENTS
Translocation Deletion
Chrmosomal Inversion
Cis Splicing
Trans Splicing
CHIMERIC TRANSCRIPTS DETECTION TOOL
POLITECNICO DI TORINO
Gene A Gene B
Fusion transcript: e.g. BCR-ABL
Graphic representations
RNA-seq Workshop, 27-28 March 2012 Francesco Abate - Politecnico di Torino
Concordant Reads
Exon - GA Intron - GA Exon - GA
Discordant Reads
Exon - GA Intergenic Region Exon - GB
Splicing Event
Gene Fusion
Fusion transcript: e.g. BCR-ABL
CHIMERIC TRANSCRIPTS DETECTION TOOL
6
POLITECNICO DI TORINO
Fusion Transcripts Detection Tool: Bellerophontes
Bellerophontes Features:
Accurate junction model definition implemented by a set of modular filters;
Splicing-driven alignment and abundance estimation analysis through TopHat and Cufflinks;
Effective junction detection based on alignment of unmapped reads on a virtual reference.
RNA-seq Workshop, 27-28 March 2012 Francesco Abate - Politecnico di Torino
Bellerophontes: A RNA-Seq data analysis framework for chimeric transcripts discovery based on accurate fusion model
Francesco Abate , Andrea Acquaviva , Giulia Paciello , Elisa Ficarra , Alberto Ferrarini , Massimo Delledonne , Simona Soverini, Giovanni Martinelli , and Enrico Macii
Bioinformatics. 2012 Aug 15
CHIMERIC TRANSCRIPTS DETECTION TOOL
7
POLITECNICO DI TORINO
Chimeric Transcripts Prioritization Tool: Pegasus
Pegasus: a comprehensive annotation tool for detection of biologically relevant gene fusions in cancer
Francesco Abate, Andrea Acquaviva, Elisa Ficarra, Giorgio Inghirami and Raul Rabadan UNDER REVIEW
Pegasus perfoms:
The creation of a complete Fusion Candidates Database of of the entire set of gene fusion candidates detected by any of fusion detection tools;
The reassembly of the chimeric transcript on the base of the two genes involved in the fusion, the genomic breakpoint coordinates and the gene annotations;
The Annotation of the assembled fusion sequence to provide information on the fusion frame and to generate a complete and exhaustive report of the protein domains conserved and lost in the gene fusion and the presence or not of a kinase gene.
CHIMERIC TRANSCRIPTS PRIORITIZATION TOOL
8
POLITECNICO DI TORINO
Interesting Chimeric Transcripts Analysis
The fusions considered significant after Pegasus analysis, on the basis of the fusion frame, the presence of kinases and the the domains conserved or loss in the gene fusion, have to be however further investigated before PCR validation in order to avoid experiments involving false gene fusions.
Ad hoc analysis pipelines have been developed on the basis of the kind of data (
read lenghts, coverage, data format, pathology) provided.
The developed pipeline are intended to integrate the information deriving from biologists/doctors/biochtecnologists with those from Pegasus outputs.
INTERESTING CHIMERIC TRANSCRIPTS ANALYSIS
9
POLITECNICO DI TORINO
Biological Overview
RNA-seq Workshop, 27-28 March 2012 Francesco Abate - Politecnico di Torino
Variable regions of immunoglobulin heavy (IGH) and immunoglobulin light (IGL) chains of BCR are assembled respectively from germline V, D, J and V, J segments thanks to a site-specific recombination reaction called V(D)J recombination that involves the developing of T and B lymphocytes .
Genes in Heavy Chain Locus
VDJ recombination
The deriving diversity determines the huge variability of interactions possible between antigens and antigen receptors; such kind of cells can expand under specific conditions (e.g. antigen encounter) and form monoclonal populations bearing identically rearranged gene segments. These clonal populations are usually under tight control mechanisms. However, under special occasions they might expand to an extent which causes a disease, such as in autoimmune disorders, leukemias and lymphomas.
VDJ RECOMBINATION DETECTION TOOL
10
POLITECNICO DI TORINO
RNA-seq Workshop, 27-28 March 2012 Francesco Abate - Politecnico di Torino
VDJ-Seq: In Silico V(D)J Recombination Detection tool Giulia Paciello , Andrea Acquaviva , Francesco Abate , Chiara Pighi , Alberto Ferrarini, Massimo Delledonne,
Alberto Zamo; and Elisa Ficarra UNDER REVIEW
VDJ-Recombination Detection Tool: V(D)J-Seq
VDJ RECOMBINATION DETECTION TOOL
VDJ-Seq workflow: 1) MAIN CLONE IDENTIFICATION
2) VDJ SEQUENCE RETRIEVING
VJ encompasssin reads retireving; VJ Couples sorted occurancy calculation; VJ Couples sorted occurancy calculation; D alleles i dentification.
11
POLITECNICO DI TORINO
Single gene analysis
On the basis of the kind of data (reads format, coverage, read lengths, …) ad hoc analysis pipeline have been developed in order to analyze genes considered of remarkably importance in different pathologies. .
By means of the aforementioned pipelines it is possible to:
SINGLE GENE ANALYSIS
Detect intron retentions; Define isoform transcripts; Determine expression levels.
12
POLITECNICO DI TORINO
Differential Expression Analysis (1)
Finding genes that are differentially expressed between conditions is an integral part of understanding the molecular basis of phenotypic variation. In the past decades, microarrays have been used extensively to quantify the abundance of mRNA corresponding to different genes, and more recently RNA-seq has emerged as a powerful competitor. As the cost of sequencing decreases, it is conceivable that the use of RNA-seq for differential expression analysis will increase rapidly. The most common use of transcriptome profiling is in the search for differentially expressed (DE) genes, that is, genes that show differences in expression level between conditions or in other ways are associated with given predictors or responses. RNA-seq offers several advantages over microarrays for differential expression analysis:
An increased dynamic range and a lower background level; The ability to detect and quantify the expression of previously unknown transcripts and isoforms.
13
DIFFERENTIAL EXPRESSION ANALYSIS
POLITECNICO DI TORINO
Differential Expression Analysis (2)
DIFFERENTIAL EXPRESSION ANALYSIS
The analysis of RNA-Seq data is, however, not without difficulties. These difficulties can be inherent to next-generation sequencing procedures (within-sample biases) or not (between-samples biases) :
Variation in nucleotide composition between genomic regions implies that the read coverage may not be uniform along the genome; More reads will map to longer genes than to shorter ones with the same expression level; The sequencing depths or library sizes (the total number of mapped reads) are typically different for different samples, so counts are not directly comparable between samples.
Ad hoc analysis pipelines ,which comprise the data normalization , the choice of the better models for differential expression analysis and the correct setting of the thresholds,
have been developed on the basis of the kind of data and the conditions that have to be tested.
14