the server of the spanish population variability

26
CIBERER Exome Server (CES) The server of the Spanish Population Variability Joaquín Dopazo, PhD Department of Computational Genomics, CIPF, Valencia Hospital Universitario La Paz, Madrid 28 de abril, 2014

Upload: joaquin-dopazo

Post on 15-Jul-2015

229 views

Category:

Health & Medicine


0 download

TRANSCRIPT

Page 1: The server of the Spanish Population Variability

CIBERER Exome Server (CES) The server of the Spanish Population Variability

Joaquín Dopazo, PhD Department of Computational Genomics, CIPF, Valencia

Hospital Universitario La Paz, Madrid

28 de abril, 2014

Page 2: The server of the Spanish Population Variability

Why is interesting to have a Spanish Exome Variant repository

Rationale: Local variability is more important than previously thought. The existence of numerous local rare variants, many of them (apparently) deleterious hampers the prioritization of disease variants. Data recycling: CIBERER has accumulated a large number of samples that can be used as (pseudo)controls of normal population

Page 3: The server of the Spanish Population Variability

Pipeline of data analysis Primary processing

Initial QC

FASTQ file

Mapping

BAM file

Variant calling

VCF File

Knowledge-based prioritization

Proximity to other known disease genes

Functional proximity

Network proximity

Burden tests

Other prioritization methods

Secondary analysis

(Successive filtering)

Variant annotation

Filtering by effect

Filtering by MAF

Filtering by family segregation

Primary analysis Gene prioritization

1000 genomes

EVS

Local variants

Page 4: The server of the Spanish Population Variability

Use known variants and their population frequencies to filter out. • Typically dbSNP, 1000 genomes

and the 6515 exomes from the

ESP are used as sources of

population frequencies.

• We selected 75 local controls to

add and extra filtering step to the

analysis pipeline

Novembre et al., 2008.

Genes mirror geography

within Europe. Nature

Comparison of Spanish controls to 1000g

How important do you

think is local information

to detect disease genes?

Page 5: The server of the Spanish Population Variability

Filtering with or without local variants

Number of genes as a function of individuals in the study of a dominant disease Retinitis Pigmentosa autosomal dominant

The use of local

variants makes an

enormous difference

Page 6: The server of the Spanish Population Variability

What do we know about the Spanish population Variability?

Page 7: The server of the Spanish Population Variability

Using CIBERER families to create a first version of the database of local variability of Spanish population

• In each family we select two unrelated members (preferably the parents)

• If there are no parents, then one of the unaffected children (unaffected, if possible) are selected

• A total of 75, out of the 136 samples available among the families analyzed in the BiER, were initially selected.

• Variant files (VCF) were obtained following the same pipeline (with missing values included) and merged.

• Genotype proportions and MAFs were obtained for all the variable positions. ONLY this information is used in the web server.

Page 8: The server of the Spanish Population Variability

Samples used UNIT n %

U723 12 16

U737 11 14,7

U759 2 2,7

U705 10 13,3

U720 12 16

U732 1 1,3

U755 3 4

U746 9 12

U728 2 2,7

U729 3 4

U703 7 9,3

U718 1 1,3

U730 2 2,7

Total 75 100

DISEASE n %

3-Methylglutaconic aciduria 11 14,7

Atypical fracture 4 5,3

Autosomal DOMINANT non-syndromic hearing loss 1 1,3

Autosomal RECESSIVE non-syndromic hearing loss 1 1,3

BCKDK-deficiency disease 2 2,7 CMT 1 1,3

Congenital disorder of glycosylation types I and II 8 10,7 CoQ disease 3 4,0

CoQ10 deficiency and DNA depletion 3 4,0

CoQ10 deficiency 2 2,7

Inherited Metabolic Disease 2 2,7

MMD (Multiple deletion of mitochondrial DNA) 4 5,3

MSUD (Maple Syrup Urine Disease) 1 1,3 Opitz 8 10,7 Pelizaeus-like 2 2,7

RCD (Respiratory complexes deficiency) 8 10,7

Retinitis pigmentosa 11 14,7 Usher 3 4,0 Total 75 100,0

Gender

Man

Woman

Phenotype

Affected

Healthy

Page 9: The server of the Spanish Population Variability

Variability spectrum of the Spanish population

A total of 131.897 variant positions, unique in Spanish population, were

detected in all the 75 samples together. Approximately 90.000 were

singletons. 51.295 variants are non-synonymous changes and 18.450

correspond to synonymous changes (singleton-driven pattern, opposite to

variants shared with 1000g and EVS, from polymorphic positions).

Page 10: The server of the Spanish Population Variability

The CIBERER Exome Server (CES): the first repository of variability of the Spanish

population Only another similar initiative exists:

the GoNL http://www.nlgenome.nl/

http://ciberer.es/bier/exome-server/

Page 11: The server of the Spanish Population Variability

Information provided

Genotypes in the

different reference

populations

Genomic coordinates,

variation, and gene.

SNPid

if any

Page 12: The server of the Spanish Population Variability

Information provided

PolyPhen and SIFT

patogenicity indexes Phenotyphe,

if available

Page 13: The server of the Spanish Population Variability

Variants can also be seen in their genomic context

GenomeMaps viewer (Medina et al., 2013, NAR) embedded in the application.

GenomeMaps is the official genome viewer of the ICGC (http://dcc.icgc.org/)

Page 14: The server of the Spanish Population Variability

Occurrence of pathological variants in “normal” population

Reference

genome is

mutated

Nine carriers

in 1000

genomes

One affect and

73 carriers in

EVS

Page 15: The server of the Spanish Population Variability

Current usage options

Query

Configuration

of the display

Genomic

context

Page 16: The server of the Spanish Population Variability

Spanish variability database. FAQ

What is stored in the database?

ONLY frequencies of the genotypes observed in the positions in which

variants have been found in at least one individual. This information is

obtained from Spanish unrelated individuals.

What information is provided by the database?

Aggregated information on the genotype frequencies of the variable position

in the gene(s) requested.

Is possible to know that a particular individual is stored in the database?

No, unless you sequence the individual and check if the genotype

frequencies are compatible with the database, but seems stupid because

you already have the information pursued.

Lets imagine that I am stupid and managed to know that the individual is in

the database, can I retrieve her/his genome?

No, it is impossible from the aggregated information

Page 17: The server of the Spanish Population Variability

Spanish variability database. FAQ

Who can contribute?

Anyone (especially if you are sequencing with public resources)

What do you need to submit?

Anonymized files of variants (VCF: variant calling format)

Why VCFs?

Because we need to check that your contribution contains no relatives of

the individuals in the database

Page 18: The server of the Spanish Population Variability

What’s next?

• Strategic steps:

– Populating the database with contributions of CIBERER and externals. Future project SPANEx

– Opening the database

• Technical steps:

– Automatic access to the local variability data via webservices

– Use in gene discovery pipelines

– Use for the interpretation of incidental findings in diagnostic panels

Page 19: The server of the Spanish Population Variability

Table of Spanish Frequencies

(TSF)

DB of Spanish variants (DBSV)

Chr Position Ref Alt 0/0 0/1 1/1

1 1365313 A T 75 0 0

1 1484884 G A 70 4 1

2 326252 T C 25 35 15

CES use

Other countries

CES input

External

Unrelated? (DBSV)

VCFs Spanish? (TSF)

YES YES

NO NO

Counts

Internal

Regional

Page 20: The server of the Spanish Population Variability

Future of the Database of variation in Spanish population

CIBERER contributions

SPANEx contributions

Page 21: The server of the Spanish Population Variability

CIBERER 76 samples Unaffected

CES II 76+269+X

Mixed

MGP 269 samples

Healthy controls

Phase I Phase II Phase III

CES II 1000+76+269+X

Mixed

More CIBERER samples

SPANEX: 1000 exomes

CIBERER

CIBERER exome server roadmap

2014-June 2014 2015

Page 22: The server of the Spanish Population Variability

Future utilization. Access via webservices

Access to aggregated data of

variation and genotype

frequencies. Therefore, no

confidentiality or privacy issues

associated.

Spanish variation database

CellBase. (Bleda et al., 2012. NAR) Our

data server system. Now at the EBI

Page 23: The server of the Spanish Population Variability

NA19660 NA19661

NA19600 NA19685

BiERapp: the interactive filtering tool for easy candidate prioritization

http://bierapp.babelomics.org

Page 24: The server of the Spanish Population Variability

Panel (real or virtual) manager

Tool for defining panels

New filter based on

local population

variant frequencies

If no diagnostic variants appear, then

secondary findings can be studied

Diagnostic mutations

http://team.babelomics.org

Page 25: The server of the Spanish Population Variability

Take home message

• Local variability is critical for distinguishing real pathologic variants from local polymorphisms

• CES will be populated with the SPANEX project (M.A. Moreno talk)

• CES is the starting point of a more ambitious crowdsourcing project that aims at constructing a high-resolution map of the Spanish population variation

• Contributions to CES are compliant with confidentially issues. No patient information is shared, only statistical information.

Page 26: The server of the Spanish Population Variability

The Computational Genomics Department at the Centro de Investigación Príncipe Felipe (CIPF),

Valencia, Spain, and… ...the INB, National Institute of Bioinformatics (Functional Genomics Node) and the CIBERER Network of Centers for Rare Diseases, and…

...the Medical Genome Project (Sevilla)

@xdopazo

@bioinfocipf