gene-gene /snp-snp interaction: biofilter

39
Gene-Gene /SNP-SNP Interaction: BIOFILTER GBIO0002 Archana Bhardwaj University of Liege 12/5/2017 1 AB-ULg

Upload: others

Post on 27-May-2022

33 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Gene-Gene /SNP-SNP Interaction: BIOFILTER

Gene-Gene /SNP-SNP Interaction: BIOFILTER GBIO0002

Archana BhardwajUniversity of Liege

12/5/2017 1AB-ULg

Page 2: Gene-Gene /SNP-SNP Interaction: BIOFILTER

The combinatorial problem of jointly analyzing the millions ofgenetic variations accessible by high-throughput genotypingtechnologies is a difficult challenge.12/5/2017 2AB-ULg

Page 3: Gene-Gene /SNP-SNP Interaction: BIOFILTER

12/5/2017 3AB-ULg

Page 4: Gene-Gene /SNP-SNP Interaction: BIOFILTER

Biofilter uses publicly available databases to establishrelationships between gene-products

12/5/2017 4AB-ULg

Page 5: Gene-Gene /SNP-SNP Interaction: BIOFILTER

LOKI DB : dbSNP

12/5/2017 5AB-ULg

Page 6: Gene-Gene /SNP-SNP Interaction: BIOFILTER

12/5/2017 6AB-ULg

Page 7: Gene-Gene /SNP-SNP Interaction: BIOFILTER

http://www.genome.jp/kegg/pathway.html

LOKI DB : KEGG database

hsa for “human”

12/5/2017 7AB-ULg

Page 8: Gene-Gene /SNP-SNP Interaction: BIOFILTER

12/5/2017 8AB-ULg

Page 9: Gene-Gene /SNP-SNP Interaction: BIOFILTER

LOKI DB : BioGRID Database

12/5/2017 9AB-ULg

Page 10: Gene-Gene /SNP-SNP Interaction: BIOFILTER

12/5/2017 10AB-ULg

Page 11: Gene-Gene /SNP-SNP Interaction: BIOFILTER

We can annotate genomic location or region baseddata, such as results from association studies, or CNVanalyses, with relevant biological knowledge for deeperinterpretation.

We can filter genomic location or region based dataon biological criteria, such as filtering a series SNPs toretain only SNPs present in specific genes withinspecific pathways of interest.

Use of Biofilter software (1)

12/5/2017 11AB-ULg

Page 12: Gene-Gene /SNP-SNP Interaction: BIOFILTER

▪Biofilter allows researchers to annotate and/or filterdata as well generate gene-gene interaction modelsbased on existing biological knowledge.

▪We can generate Predictive Models for gene-gene, SNP-SNP, or CNV-CNV interactions based on biologicalinformation, with priority for models to be tested basedon biological relevance, thus narrowing the search spaceand reducing multiple hypothesis-testing.

12/5/2017 12AB-ULg

Use of Biofilter software (2)

Page 13: Gene-Gene /SNP-SNP Interaction: BIOFILTER

GWAS platform SNPsare mapped to Ensemblgene Ids.

Multi-marker modelsare generated from SNPswithin knowledge-related genes.

Derived models areoverlaid to assess overallmodel implication.

Biofilter : Overview

12/5/2017 13AB-ULg

Page 14: Gene-Gene /SNP-SNP Interaction: BIOFILTER

Biofilter : Three Analysis mode Biofilter has three primary analysis modes and uses theavailable biological knowledge in slightly different ways.

12/5/2017 14AB-ULg

Page 15: Gene-Gene /SNP-SNP Interaction: BIOFILTER

Biofilter Data types

12/5/2017 15AB-ULg

Page 16: Gene-Gene /SNP-SNP Interaction: BIOFILTER

Given any combination of input data, Biofilter cancross-reference the input data using the relationshipsstored in the knowledge database to generate afiltered dataset of any supported type (or types).

For example, a user can provide a list of SNPs (suchas those covered by a genotyping platform) and a listof genes (such as those thought to be related to aparticular phenotype) and request a filtered set ofSNPs. Biofilter will use LOKI’s knowledge of SNPpositions and gene regions to filter the provided

SNP list, removing all those that are not locatedwithin any of the provided genes.

Biofilter : Filtering mode

12/5/2017 16AB-ULg

Page 17: Gene-Gene /SNP-SNP Interaction: BIOFILTER

The annotations are based on the relationshipsstored in the knowledge database; unlike filtering,any data which cannot be annotated as requested(such as a SNP which is not located within any gene)will still be included in the output, with theannotation columns of the output simply left blank.

For example, a list of SNPs can be annotated withpositions to generate a new list of all the same SNPs,but with extra columns containing the chromosomeand genomic position for each SNP (if any). Any SNPwith multiple known positions will be repeated, andany SNP with no known position will haveblanks in the added columns.

Biofilter : Annotation mode

12/5/2017 17AB-ULg

Page 18: Gene-Gene /SNP-SNP Interaction: BIOFILTER

Biofilter : Annotation mode

12/5/2017 18AB-ULg

Page 19: Gene-Gene /SNP-SNP Interaction: BIOFILTER

The last of Biofilter’s primary analysis modes is a littledifferent from filtering and annotation.

In addition to simply cross-referencing any given datawith the other available prior knowledge, Biofilter canalso search for repeated patterns within the priorknowledge which might indicate the potential forimportant interactions between SNPs or genes.

Biofilter : Model analysis mode(1)

12/5/2017 19AB-ULg

Page 20: Gene-Gene /SNP-SNP Interaction: BIOFILTER

12/5/2017 AB-ULg 20

The key idea behind this analysis is that If the sametwo genes appear together in more than one grouping,they’re likely to have an important biologicalrelationship; if they appear in multiple groups fromseveral independent sources, then they’re even morelikely to be biologically related in some way.

Biofilter : Model analysis mode(2)

Page 21: Gene-Gene /SNP-SNP Interaction: BIOFILTER

Biofilter has access to thousands of such groupings and cananalyze all of them to identify the pairs of genes or SNPsappearing together in the greatest number of groupings and thewidest array of original data sources. These pairs can then betested for significance within a research dataset, avoiding theprohibitive computational and multipletesting Burden of anexhaustive pairwise analysis.

Biofilter : Model analysis mode (3)

12/5/2017 21AB-ULg

Page 22: Gene-Gene /SNP-SNP Interaction: BIOFILTER

The LOKI prior knowledge database must be generated beforeBiofilter can be used. This is done with the “loki-build.py” scriptwhich was installed along with Biofilter. There are several options forthis utility which are detailed below, but to get started, you just need“--knowledge” and “--update”:

loki-build.py --verbose --knowledge loki.db –update

This will download and process the bulk data files from allsupported knowledge sources, storing the result in the file “loki.db”(which we recommend naming after the current date, such as“loki20140521.db”).

Compiling Prior Knowledge: Loki.db

12/5/2017 22AB-ULg

Page 23: Gene-Gene /SNP-SNP Interaction: BIOFILTER

--update Arguments: [source] […] Default: allInstructs the build script to process the bulk data from the specifiedsources and update their representation in the knowledge database. Ifno sources are specified, all supported sources will be updated.

--update-except Arguments: [source] […] Default: noneSimilar to “--update” but with the opposite meaning for the specifiedsources: all supported sources will be updated except for the onesspecified. If no sources are specified, none are excluded, and allsupported sources are updated.

--option Arguments: <source> <options> Default: nonePasses additional options to the specified source loader module. Theoptions string must be of the form “option1=value,option2=value” forany number of options and values. Supported options and values foreach source can be shown with “--list-sources”.

Updating Prior Knowledge: Loki.db (1)

12/5/2017 23AB-ULg

Page 24: Gene-Gene /SNP-SNP Interaction: BIOFILTER

--force-update Argument: none

The build script will normally only update from a sources if it detects that an update is necessary, either because new data files have been downloaded from the source or because the source’s loader module code has been updated. With this option, the build script will update all specified sources, even if it believes no update is necessary.

Updating Prior Knowledge : Loki.db (2)

12/5/2017 24AB-ULg

Page 25: Gene-Gene /SNP-SNP Interaction: BIOFILTER

Biofilter and LOKI allow for gene regions to beadjusted by the linkage disequilibrium (LD) patterns in agiven population.

When comparing a known gene region to any otherregion or position (such as CNVs or SNPs), areas in highLD with a gene can be considered part of the gene, evenif the region lies outside of the gene’s canonicalboundaries.

This step require use of additional tool

LD Profiles : GWAS information

12/5/2017 25AB-ULg

Page 26: Gene-Gene /SNP-SNP Interaction: BIOFILTER

Biofilter can be run from a command-line terminal by executing

biofilter.py or python biofilter.py

All options can either be provided directly on the command line

biofilter.py --option-name

configuration files could be given as input such as

biofilter.py analysis.config

Biofilter : Command lines vs Configuration

12/5/2017 26AB-ULg

Page 27: Gene-Gene /SNP-SNP Interaction: BIOFILTER

Biofilter : Configuration file

▪biofilter.py test.config

12/5/2017 27AB-ULg

Page 28: Gene-Gene /SNP-SNP Interaction: BIOFILTER

Options on the command line are lower-case, start with twodashes and may contain single dashes to separate words (such as “--snp-file”),

while in a configuration file the same option would be in upper-case, contain no dashes and instead use underscores to separatewords (i.e. “SNP_FILE”).

Many commandline options also have alternative shorthand versions of one or a fewletters, such as “-s” for “--snp-file” and “--aag” for “--allow-ambiguous-genes”.

Biofilter : Command lines vs Configuration

12/5/2017 28AB-ULg

Page 29: Gene-Gene /SNP-SNP Interaction: BIOFILTER

--help / HELPDisplays the program usage and immediately exits.

--version / VERSIONDisplays the software versions and immediately exits. Note thatBiofilter is built upon LOKI and SQLite, each of which will alsoreport their own software versions.

--report-configuration / REPORT_CONFIGURATION

Argument: [yes/no] Default: noGenerates a Biofilter configuration file which specifies the currenteffective value of all program options, including any defaultoptions which were not overridden.

Configuration Options

12/5/2017 29AB-ULg

Page 30: Gene-Gene /SNP-SNP Interaction: BIOFILTER

--knowledge / KNOWLEDGEArgument: <file> Default: none

--report-genome-build / REPORT_GENOME_BUILDArgument: [yes/no] Default: yes

--report-gene-name-stats / REPORT_GENE_NAME_STATSArgument: [yes/no] Default: no

--report-group-name-stats / REPORT_GROUP_NAME_STATSArgument: [yes/no] Default: no

--allow-unvalidated-snp-positions / ALLOW_UNVALIDATED_SNP_POSITIONS Argument: [yes/no] Default: yes

--allow-ambiguous-snps / ALLOW_AMBIGUOUS_SNPS

Prior Knowledge Options

12/5/2017 30AB-ULg

Page 31: Gene-Gene /SNP-SNP Interaction: BIOFILTER

--snp / SNPArguments: <snp> [snp] […] Default: none

--snp-file / SNP_FILEArguments: <file> [file] […] Default: none

--position / POSITIONArguments: <position> [position] […] Default: none

--position-file / POSITION_FILEArguments: <file> [file] […] Default: none

-region / REGIONArguments: <region> [region] […] Default: none

--region-file / REGION_FILEArguments: <file> [file] […] Default: none

Primary Input Data Options

12/5/2017 31AB-ULg

Page 32: Gene-Gene /SNP-SNP Interaction: BIOFILTER

--filter / FILTERArgument: <type> [type] […] Default: none

Perform a filtering analysis which outputs the specified type

--annotate / ANNOTATEArgument: <type> [type] […] [:] <type> [type] […] Default: none

--model / MODELArgument: <type> [type] […] [:] [type] […] Default: none

Output Options : Mode of analysis

12/5/2017 32AB-ULg

Page 33: Gene-Gene /SNP-SNP Interaction: BIOFILTER

input1 #snprs11rs12rs13rs14rs15rs16

KNOWLEDGE test.dbSNP_FILE input1GENE_FILE input2FILTER snp

Test.config

input2#geneACE

run “ biofilter.py Test.config”What is expected output ??? Can you make inference by lookingfigure

Filter mode : search SNPs that correspond to a list of genes

12/5/2017 33AB-ULg

Page 34: Gene-Gene /SNP-SNP Interaction: BIOFILTER

Annotation mode : a SNP with gene region information

KNOWLEDGE test.dbSNP rs11 rs24 rs99ANNOTATE snp region

Test.config

Biofilter.py test.config

Output

12/5/2017 34AB-ULg

Page 35: Gene-Gene /SNP-SNP Interaction: BIOFILTER

Step 1Map the input list of SNPs to genes within Biofilter.

Step 2Connect, pairwise, the genes that contain SNPs in the input list ofSNPs.

Step 3Break down the gene-gene models into all pairwise combinationsof SNPs across the genes within sources

Pair wise Gene-Gene and SNP-SNP interaction

12/5/2017 35AB-ULg

Page 36: Gene-Gene /SNP-SNP Interaction: BIOFILTER

we will use all of the SNPs on the first chromosome.Test.config

KNOWLEDGE test.dbSNP 11 12 13 14 15 16 17 18 19FILTER gene

Step 1 : Pair wise Gene-Gene and SNP-SNP interaction

Output:#geneABCDE

12/5/2017 36AB-ULg

Page 37: Gene-Gene /SNP-SNP Interaction: BIOFILTER

KNOWLEDGE test.dbGENE A B C D EMODEL gene

Test.config

#gene1 #gene2 scoreA C 2-3

biofilter.py test.config

output

Step 2 : Connect, pairwise, the genes that contain SNPs in the input list of SNPs.

12/5/2017 37AB-ULg

Page 38: Gene-Gene /SNP-SNP Interaction: BIOFILTER

Step 3 : Break down the gene-gene models into all pairwise combinations of SNPs

biofilter.py test.config

12/5/2017 38AB-ULg

Page 39: Gene-Gene /SNP-SNP Interaction: BIOFILTER

12/5/2017 AB-ULg 39