next generation sequencing activities - polito

14
POLITECNICO DI TORINO Next Generation Sequencing Activities Meeting Politecnico of Turin-Ebri Fundation PACIELLO Giulia on Behalf of 2 July 2013 FICARRA Elisa Department of Control and Computer Engineering Politecnico of Turin, Italy

Upload: others

Post on 24-Oct-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Next Generation Sequencing Activities - PoliTO

POLITECNICO DI TORINO

Next Generation Sequencing Activities

Meeting Politecnico of Turin-Ebri Fundation PACIELLO Giulia on Behalf of

2 July 2013 FICARRA Elisa

Department of Control and Computer Engineering

Politecnico of Turin, Italy

Page 2: Next Generation Sequencing Activities - PoliTO

POLITECNICO DI TORINO

Department of Control and Computer Engineering (DAUIN)

RNA-seq Workshop, 27-28 March 2012 Francesco Abate - Politecnico di Torino 1

Point of reference in Politecnico di Torino for the area of Information and Communication Technologies (ICT);

Promotes and manages basic and applied research, training, technology transfer and services in the areas of systems and control engineering, computer science and computer engineering and operations research;

14 research laboratories, more than 60 researchers, about 100 PhD students and research collaborators.

Page 3: Next Generation Sequencing Activities - PoliTO

POLITECNICO DI TORINO

EDA Group

RNA-seq Workshop, 27-28 March 2012 Francesco Abate - Politecnico di Torino 2

Three main research areas: Computer-aided design of digital electronic circuits and systems, with particular

emphasis on methodologies, algorithms and tools for power estimation and optimization of systems;

Smart city and Smart systems, with particular emphasis on wireless sensor and actuator network for environment monitoring and control and middleware for network interoperability;

Bioinformatics (BIO-EDA group), with special emphasis on algorithms and tools for computational biology, next generation sequencing (NGS), molecular dynamics, biomedical signal and image processing and genetic network implementation.

EDA (Electronic Design Automation) Group: 2 full professors 1 associate professor 3 assistant professors 6 Post-Doctoral researchers 8 PhD students 5 research assistants 3 secretaries

CONTACTS Politecnico di Torino, DAUIN, Corso Duca Degli Abruzzi, 24 10129, Torino, Italy Tel. +39 011 090 7042 Fax. +39 011 090 7099 Email (secretariat): [email protected]

Page 4: Next Generation Sequencing Activities - PoliTO

POLITECNICO DI TORINO

NGS Team

RNA-seq Workshop, 27-28 March 2012 Francesco Abate - Politecnico di Torino 3

External Collaborations

• Raul Rabadan (Department of Biomedical Informatics, Columbia University, USA)

• Alberto Ferrarini, Massimo Delledonne (Department of Biotechnology, University of Verona, ITALY)

• Ilaria Iacobucci , Simona Soverini, Giovanni Martinelli (Department of Medical Oncology and

Hematology “L. e A. Seràgnoli”, University of Bologna, ITALY)

• Roberto Piva, Giorgio Inghirami (CERMS, Torino, ITALY)

• Alberto Zamò (Department of Pathology and Diagnostics, University of Verona, ITALY)

• Enzo Medico, Claudio Isella, Consalvo Petti (IRCC, Candiolo, ITALY)

• Raffaele Calogero (MBC, University of Torino, ITALY)

NGS Team@POLITO

• Andrea Acquaviva, Elisa Ficarra, Francesco Abate, Giulia Paciello, Gaspare Scherma, Gianvito

Urgese

Page 5: Next Generation Sequencing Activities - PoliTO

POLITECNICO DI TORINO

5 RNA-seq Workshop, 27-28 March 2012 Francesco Abate - Politecnico di Torino

Biological Overview Fusion transcripts are chimeric RNA that can be encoded by: FUSION GENES TRANS SPLICING EVENTS

Translocation Deletion

Chrmosomal Inversion

Cis Splicing

Trans Splicing

CHIMERIC TRANSCRIPTS DETECTION TOOL

Page 6: Next Generation Sequencing Activities - PoliTO

POLITECNICO DI TORINO

Gene A Gene B

Fusion transcript: e.g. BCR-ABL

Graphic representations

RNA-seq Workshop, 27-28 March 2012 Francesco Abate - Politecnico di Torino

Concordant Reads

Exon - GA Intron - GA Exon - GA

Discordant Reads

Exon - GA Intergenic Region Exon - GB

Splicing Event

Gene Fusion

Fusion transcript: e.g. BCR-ABL

CHIMERIC TRANSCRIPTS DETECTION TOOL

6

Page 7: Next Generation Sequencing Activities - PoliTO

POLITECNICO DI TORINO

Fusion Transcripts Detection Tool: Bellerophontes

Bellerophontes Features:

Accurate junction model definition implemented by a set of modular filters;

Splicing-driven alignment and abundance estimation analysis through TopHat and Cufflinks;

Effective junction detection based on alignment of unmapped reads on a virtual reference.

RNA-seq Workshop, 27-28 March 2012 Francesco Abate - Politecnico di Torino

Bellerophontes: A RNA-Seq data analysis framework for chimeric transcripts discovery based on accurate fusion model

Francesco Abate , Andrea Acquaviva , Giulia Paciello , Elisa Ficarra , Alberto Ferrarini , Massimo Delledonne , Simona Soverini, Giovanni Martinelli , and Enrico Macii

Bioinformatics. 2012 Aug 15

CHIMERIC TRANSCRIPTS DETECTION TOOL

7

Page 8: Next Generation Sequencing Activities - PoliTO

POLITECNICO DI TORINO

Chimeric Transcripts Prioritization Tool: Pegasus

Pegasus: a comprehensive annotation tool for detection of biologically relevant gene fusions in cancer

Francesco Abate, Andrea Acquaviva, Elisa Ficarra, Giorgio Inghirami and Raul Rabadan UNDER REVIEW

Pegasus perfoms:

The creation of a complete Fusion Candidates Database of of the entire set of gene fusion candidates detected by any of fusion detection tools;

The reassembly of the chimeric transcript on the base of the two genes involved in the fusion, the genomic breakpoint coordinates and the gene annotations;

The Annotation of the assembled fusion sequence to provide information on the fusion frame and to generate a complete and exhaustive report of the protein domains conserved and lost in the gene fusion and the presence or not of a kinase gene.

CHIMERIC TRANSCRIPTS PRIORITIZATION TOOL

8

Page 9: Next Generation Sequencing Activities - PoliTO

POLITECNICO DI TORINO

Interesting Chimeric Transcripts Analysis

The fusions considered significant after Pegasus analysis, on the basis of the fusion frame, the presence of kinases and the the domains conserved or loss in the gene fusion, have to be however further investigated before PCR validation in order to avoid experiments involving false gene fusions.

Ad hoc analysis pipelines have been developed on the basis of the kind of data (

read lenghts, coverage, data format, pathology) provided.

The developed pipeline are intended to integrate the information deriving from biologists/doctors/biochtecnologists with those from Pegasus outputs.

INTERESTING CHIMERIC TRANSCRIPTS ANALYSIS

9

Page 10: Next Generation Sequencing Activities - PoliTO

POLITECNICO DI TORINO

Biological Overview

RNA-seq Workshop, 27-28 March 2012 Francesco Abate - Politecnico di Torino

Variable regions of immunoglobulin heavy (IGH) and immunoglobulin light (IGL) chains of BCR are assembled respectively from germline V, D, J and V, J segments thanks to a site-specific recombination reaction called V(D)J recombination that involves the developing of T and B lymphocytes .

Genes in Heavy Chain Locus

VDJ recombination

The deriving diversity determines the huge variability of interactions possible between antigens and antigen receptors; such kind of cells can expand under specific conditions (e.g. antigen encounter) and form monoclonal populations bearing identically rearranged gene segments. These clonal populations are usually under tight control mechanisms. However, under special occasions they might expand to an extent which causes a disease, such as in autoimmune disorders, leukemias and lymphomas.

VDJ RECOMBINATION DETECTION TOOL

10

Page 11: Next Generation Sequencing Activities - PoliTO

POLITECNICO DI TORINO

RNA-seq Workshop, 27-28 March 2012 Francesco Abate - Politecnico di Torino

VDJ-Seq: In Silico V(D)J Recombination Detection tool Giulia Paciello , Andrea Acquaviva , Francesco Abate , Chiara Pighi , Alberto Ferrarini, Massimo Delledonne,

Alberto Zamo; and Elisa Ficarra UNDER REVIEW

VDJ-Recombination Detection Tool: V(D)J-Seq

VDJ RECOMBINATION DETECTION TOOL

VDJ-Seq workflow: 1) MAIN CLONE IDENTIFICATION

2) VDJ SEQUENCE RETRIEVING

VJ encompasssin reads retireving; VJ Couples sorted occurancy calculation; VJ Couples sorted occurancy calculation; D alleles i dentification.

11

Page 12: Next Generation Sequencing Activities - PoliTO

POLITECNICO DI TORINO

Single gene analysis

On the basis of the kind of data (reads format, coverage, read lengths, …) ad hoc analysis pipeline have been developed in order to analyze genes considered of remarkably importance in different pathologies. .

By means of the aforementioned pipelines it is possible to:

SINGLE GENE ANALYSIS

Detect intron retentions; Define isoform transcripts; Determine expression levels.

12

Page 13: Next Generation Sequencing Activities - PoliTO

POLITECNICO DI TORINO

Differential Expression Analysis (1)

Finding genes that are differentially expressed between conditions is an integral part of understanding the molecular basis of phenotypic variation. In the past decades, microarrays have been used extensively to quantify the abundance of mRNA corresponding to different genes, and more recently RNA-seq has emerged as a powerful competitor. As the cost of sequencing decreases, it is conceivable that the use of RNA-seq for differential expression analysis will increase rapidly. The most common use of transcriptome profiling is in the search for differentially expressed (DE) genes, that is, genes that show differences in expression level between conditions or in other ways are associated with given predictors or responses. RNA-seq offers several advantages over microarrays for differential expression analysis:

An increased dynamic range and a lower background level; The ability to detect and quantify the expression of previously unknown transcripts and isoforms.

13

DIFFERENTIAL EXPRESSION ANALYSIS

Page 14: Next Generation Sequencing Activities - PoliTO

POLITECNICO DI TORINO

Differential Expression Analysis (2)

DIFFERENTIAL EXPRESSION ANALYSIS

The analysis of RNA-Seq data is, however, not without difficulties. These difficulties can be inherent to next-generation sequencing procedures (within-sample biases) or not (between-samples biases) :

Variation in nucleotide composition between genomic regions implies that the read coverage may not be uniform along the genome; More reads will map to longer genes than to shorter ones with the same expression level; The sequencing depths or library sizes (the total number of mapped reads) are typically different for different samples, so counts are not directly comparable between samples.

Ad hoc analysis pipelines ,which comprise the data normalization , the choice of the better models for differential expression analysis and the correct setting of the thresholds,

have been developed on the basis of the kind of data and the conditions that have to be tested.

14