annovar variants analysismolsim.sci.univr.it/2014_bioinfo2/genomica/06_annovar.pdfannovar...

13
Annovar Variants Analysis http://www.openbioinformatics.org/annovar/ http://www.openbioinformatics.org/annovar/ Marin Vargas, Sergio Paul Dicembre 2013

Upload: others

Post on 03-Jun-2020

16 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Annovar Variants Analysismolsim.sci.univr.it/2014_bioinfo2/genomica/06_Annovar.pdfAnnovar description Annovar is a program for functional annotation of genetic variants from high-throughput

Annovar

Variants Analysishttp://www.openbioinformatics.org/annovar/http://www.openbioinformatics.org/annovar/

Marin Vargas, Sergio Paul

Dicembre 2013

Page 2: Annovar Variants Analysismolsim.sci.univr.it/2014_bioinfo2/genomica/06_Annovar.pdfAnnovar description Annovar is a program for functional annotation of genetic variants from high-throughput

Variants Analysisto diagnosis of Genetic Disease

DNA

Extraction

DNA Sequencing

(Genome or Exome)

FASTQ files

Variants Calling

Genome reference

Illumina Hiseq

Variants Calling

(BWA + GATK)

VCF files

Variants Analysis

(Several softwares)

Page 3: Annovar Variants Analysismolsim.sci.univr.it/2014_bioinfo2/genomica/06_Annovar.pdfAnnovar description Annovar is a program for functional annotation of genetic variants from high-throughput

Annovar description� Annovar is a program for functional annotation of genetic

variants from high-throughput sequencing data.

� Efficient tool to functional annotation of genetic variants from

diverse genomes (human, mouse, worm, fly, yeast, etc).

Genetic ANNOVAR

The most likely

causal variants Genetic

variants(VCF format)

ANNOVARcausal variants

and their

corresponding

candidate genes

Annotated genomes(GFF3 format)

UCSC, ENSEMBL(human, mouse, cow, etc)

BiologicalKnowledge(Predictors)

Page 4: Annovar Variants Analysismolsim.sci.univr.it/2014_bioinfo2/genomica/06_Annovar.pdfAnnovar description Annovar is a program for functional annotation of genetic variants from high-throughput

Annovar goal

� Variants reduction, through a stepwise procedure is possible

excluded variants that are unlikely to be disease causal and so

identify the putative genes involved in the disease.

� Filtering synonymous SNP

for further analysis.

� Different prediction

algorithms use differentalgorithms use different

information, then we use

predictions from multiple

algorithms.

� Querying predictions from

different databases for

different algorithms is both

tedious and time

consuming.

Page 5: Annovar Variants Analysismolsim.sci.univr.it/2014_bioinfo2/genomica/06_Annovar.pdfAnnovar description Annovar is a program for functional annotation of genetic variants from high-throughput

Annovar functionality� Principal functionality is given three types of functional

annotation:

� Gene based: identify whether Single Nucleotide Variant(SNV), small Ins/Del or Copy Number Variation (CNV)

cause protein coding changes.

� Region based: identify variants in specific genomic regions.

� Filter based: identify variants in base to filters on diverse

databases.databases.

� Secondary functionality:

� Retrieve the nucleotide sequence in any user-specific

genomic positions in batch.

� Identify a candidate gene list for Mendelian diseases from

exome data.

� Other utilities.

Page 6: Annovar Variants Analysismolsim.sci.univr.it/2014_bioinfo2/genomica/06_Annovar.pdfAnnovar description Annovar is a program for functional annotation of genetic variants from high-throughput

Gene based� From a whole-genome sequencing experiment on a human

subject, given a list of SNVs and indels, it is of interest to

identify the genes that are disrupted.

� For intergenic variants, we are interested in knowing what are

the two flanking genes, and what are the distances between the

variants and the flanking genes.

� For exonic variants, we are interested in knowing the amino

acid changes.acid changes.

Page 7: Annovar Variants Analysismolsim.sci.univr.it/2014_bioinfo2/genomica/06_Annovar.pdfAnnovar description Annovar is a program for functional annotation of genetic variants from high-throughput

Region based� Identify variants at conserved genomic regions.

� Identify the subset of variants that either fall

within the conserved regions (for SNPs and short

in-dels), or overlap with these conserved regions

(for large-scale CNVs).

� Use phastCons program prediction to annotate

variants that fall within conserved genomic

regions.

� Use TFBS (Transcription Factor Binding Site)

database to annotate the respective region.database to annotate the respective region.

� Identify cytogenetic band for genetic variants.

� Identify variants located in segmental

duplications (SegDup).

� Identify previously reported structural variants in

DGV (Database of Genomic Variants).

� Identify variants reported in previously published

GWAS (Genome-wide association studies).

� Identify variants in ENCODE annotated regions.

� Identify non-coding variants that disrupt

enhancers, repressors, promoters.

Page 8: Annovar Variants Analysismolsim.sci.univr.it/2014_bioinfo2/genomica/06_Annovar.pdfAnnovar description Annovar is a program for functional annotation of genetic variants from high-throughput

Filter based predictors 1� Identify subsets of variants based on

comparison to other variant

databases, for example, dbSNP or

1000 Genome Project.

� 1000 Genomes Project: started

in January 2008, is an

international research effort to

establish by far the most detailedestablish by far the most detailed

catalogue of human genetic

variation. annovar use the last

version (2012 April).

� dbSNP: The Single Nucleotide

Polymorphism Database is a free

public archive for genetic

variation within and across

different species developed and

hosted by the NCBI.

Page 9: Annovar Variants Analysismolsim.sci.univr.it/2014_bioinfo2/genomica/06_Annovar.pdfAnnovar description Annovar is a program for functional annotation of genetic variants from high-throughput

Filter based predictors 2� dbNSFP is a database developed by LJB2 (Liu, Jian and

Boerwinkle version 2) for Functional Prediction and annotation

of all potential Non-Synonymous SNVs in the human genome.

� It compiles prediction scores along with a conservation score,

from several popular algorithms and other related information.

� Thus dbSNFP use two types algorithms prediction:� Thus dbSNFP use two types algorithms prediction:

� Protein variant functional prediction.

� Variant conservation prediction.

Page 10: Annovar Variants Analysismolsim.sci.univr.it/2014_bioinfo2/genomica/06_Annovar.pdfAnnovar description Annovar is a program for functional annotation of genetic variants from high-throughput

Filter based predictors 3� dbNSFP protein variant functional prediction:

� SIFT: Sorting Intolerant From Tolerant,predicts whether an amino acid substitution islikely to affect protein function based onsequence homology and the physico-chemicalsimilarity between the alternate amino acids.

� PolyPhen2: prediction of functional effects ofhuman nsSNPs.human nsSNPs.

� LRT: Likelihood Ratio Test identify a subset ofdeleterious mutations that disrupt highlyconserved amino acids within protein-codingsequences.

� MutationTaster: rapid evaluation of thedisease-causing potential of DNA sequencealterations.

� MutationAssesor: predicts the functionalimpact of amino-acid substitutions in proteins.

� FATHMM: Functional Analysis ThroughHidden Markov Models.

Page 11: Annovar Variants Analysismolsim.sci.univr.it/2014_bioinfo2/genomica/06_Annovar.pdfAnnovar description Annovar is a program for functional annotation of genetic variants from high-throughput

Filter based predictors 4� dbNSFP variant conservation prediction:

� PhyloP: assigns conservation p-values, scores reflect either

conservation (positive scores) or selection (negative scores).

� GERP++: Genomic Evolutionary Rate Profiling, measures base

conservation.

� SiPhy: models the pattern of substitutions, rather than just the

rate. Biased substitutions (e.g. conserved lysine: AAA <-> AAG).

Page 12: Annovar Variants Analysismolsim.sci.univr.it/2014_bioinfo2/genomica/06_Annovar.pdfAnnovar description Annovar is a program for functional annotation of genetic variants from high-throughput

Filter based predictors 5� ESP (Exome Sequencing Project) annotations

� The ESP is a NHLBI funded exome sequencing project aiming to identify genetic

variants in exonic regions from over 6000 individuals, including healthy ones as

well as subjects with different diseases.

� GERP++(Genomic Evolutionary Rate Profiling) annotations� GERP identifies constrained elements in multiple alignments by quantifying

substitution deficits.

� CG (Complete Genomics) frequency annotations� Each technical platform, such as Complete Genomics and Illumina HiSeq, may� Each technical platform, such as Complete Genomics and Illumina HiSeq, may

generate some platform specific sequencing artifacts. Complete genomics

provides whole-genome data for a relatively small group of healthy subjects, but

this data set can be quite useful to filter out technical artifacts for CG users.

� Population frequency ensembl annotations� The database popfreq_all integrates PopFreqMax, 1000G2012APR_ALL,

1000G2012APR_AFR, 1000G2012APR_AMR, 1000G2012APR_ASN,

1000G2012APR_EUR, ESP6500si_AA, ESP6500si_EA, CG46, NCI60 SNP137,

COSMIC65, DISEASE.

� Generic mutation annotations� Annovar users have the flexibility to supply a custom-made annotation file, and

let ANNOVAR perform filter-based annotation on this annotation file.

Page 13: Annovar Variants Analysismolsim.sci.univr.it/2014_bioinfo2/genomica/06_Annovar.pdfAnnovar description Annovar is a program for functional annotation of genetic variants from high-throughput

Annovar result

�Two output files will be generated:

�The first file contains annotation for all variants.

�The second output file, contains the amino acid

changes as a result of the exonic variant.

�Annovar use standardized nomenclature to

annotate non-synonymous SNV and indels on

cDNA or on proteins.

Example: NOD2:NM_022162:exon4:p.R702W