gene-gene /snp-snp interaction: biofilter
TRANSCRIPT
Gene-Gene /SNP-SNP Interaction: BIOFILTER GBIO0002
Archana BhardwajUniversity of Liege
12/5/2017 1AB-ULg
The combinatorial problem of jointly analyzing the millions ofgenetic variations accessible by high-throughput genotypingtechnologies is a difficult challenge.12/5/2017 2AB-ULg
12/5/2017 3AB-ULg
Biofilter uses publicly available databases to establishrelationships between gene-products
12/5/2017 4AB-ULg
LOKI DB : dbSNP
12/5/2017 5AB-ULg
12/5/2017 6AB-ULg
http://www.genome.jp/kegg/pathway.html
LOKI DB : KEGG database
hsa for “human”
12/5/2017 7AB-ULg
12/5/2017 8AB-ULg
LOKI DB : BioGRID Database
12/5/2017 9AB-ULg
12/5/2017 10AB-ULg
We can annotate genomic location or region baseddata, such as results from association studies, or CNVanalyses, with relevant biological knowledge for deeperinterpretation.
We can filter genomic location or region based dataon biological criteria, such as filtering a series SNPs toretain only SNPs present in specific genes withinspecific pathways of interest.
Use of Biofilter software (1)
12/5/2017 11AB-ULg
▪Biofilter allows researchers to annotate and/or filterdata as well generate gene-gene interaction modelsbased on existing biological knowledge.
▪We can generate Predictive Models for gene-gene, SNP-SNP, or CNV-CNV interactions based on biologicalinformation, with priority for models to be tested basedon biological relevance, thus narrowing the search spaceand reducing multiple hypothesis-testing.
12/5/2017 12AB-ULg
Use of Biofilter software (2)
GWAS platform SNPsare mapped to Ensemblgene Ids.
Multi-marker modelsare generated from SNPswithin knowledge-related genes.
Derived models areoverlaid to assess overallmodel implication.
Biofilter : Overview
12/5/2017 13AB-ULg
Biofilter : Three Analysis mode Biofilter has three primary analysis modes and uses theavailable biological knowledge in slightly different ways.
12/5/2017 14AB-ULg
Biofilter Data types
12/5/2017 15AB-ULg
Given any combination of input data, Biofilter cancross-reference the input data using the relationshipsstored in the knowledge database to generate afiltered dataset of any supported type (or types).
For example, a user can provide a list of SNPs (suchas those covered by a genotyping platform) and a listof genes (such as those thought to be related to aparticular phenotype) and request a filtered set ofSNPs. Biofilter will use LOKI’s knowledge of SNPpositions and gene regions to filter the provided
SNP list, removing all those that are not locatedwithin any of the provided genes.
Biofilter : Filtering mode
12/5/2017 16AB-ULg
The annotations are based on the relationshipsstored in the knowledge database; unlike filtering,any data which cannot be annotated as requested(such as a SNP which is not located within any gene)will still be included in the output, with theannotation columns of the output simply left blank.
For example, a list of SNPs can be annotated withpositions to generate a new list of all the same SNPs,but with extra columns containing the chromosomeand genomic position for each SNP (if any). Any SNPwith multiple known positions will be repeated, andany SNP with no known position will haveblanks in the added columns.
Biofilter : Annotation mode
12/5/2017 17AB-ULg
Biofilter : Annotation mode
12/5/2017 18AB-ULg
The last of Biofilter’s primary analysis modes is a littledifferent from filtering and annotation.
In addition to simply cross-referencing any given datawith the other available prior knowledge, Biofilter canalso search for repeated patterns within the priorknowledge which might indicate the potential forimportant interactions between SNPs or genes.
Biofilter : Model analysis mode(1)
12/5/2017 19AB-ULg
12/5/2017 AB-ULg 20
The key idea behind this analysis is that If the sametwo genes appear together in more than one grouping,they’re likely to have an important biologicalrelationship; if they appear in multiple groups fromseveral independent sources, then they’re even morelikely to be biologically related in some way.
Biofilter : Model analysis mode(2)
Biofilter has access to thousands of such groupings and cananalyze all of them to identify the pairs of genes or SNPsappearing together in the greatest number of groupings and thewidest array of original data sources. These pairs can then betested for significance within a research dataset, avoiding theprohibitive computational and multipletesting Burden of anexhaustive pairwise analysis.
Biofilter : Model analysis mode (3)
12/5/2017 21AB-ULg
The LOKI prior knowledge database must be generated beforeBiofilter can be used. This is done with the “loki-build.py” scriptwhich was installed along with Biofilter. There are several options forthis utility which are detailed below, but to get started, you just need“--knowledge” and “--update”:
loki-build.py --verbose --knowledge loki.db –update
This will download and process the bulk data files from allsupported knowledge sources, storing the result in the file “loki.db”(which we recommend naming after the current date, such as“loki20140521.db”).
Compiling Prior Knowledge: Loki.db
12/5/2017 22AB-ULg
--update Arguments: [source] […] Default: allInstructs the build script to process the bulk data from the specifiedsources and update their representation in the knowledge database. Ifno sources are specified, all supported sources will be updated.
--update-except Arguments: [source] […] Default: noneSimilar to “--update” but with the opposite meaning for the specifiedsources: all supported sources will be updated except for the onesspecified. If no sources are specified, none are excluded, and allsupported sources are updated.
--option Arguments: <source> <options> Default: nonePasses additional options to the specified source loader module. Theoptions string must be of the form “option1=value,option2=value” forany number of options and values. Supported options and values foreach source can be shown with “--list-sources”.
Updating Prior Knowledge: Loki.db (1)
12/5/2017 23AB-ULg
--force-update Argument: none
The build script will normally only update from a sources if it detects that an update is necessary, either because new data files have been downloaded from the source or because the source’s loader module code has been updated. With this option, the build script will update all specified sources, even if it believes no update is necessary.
Updating Prior Knowledge : Loki.db (2)
12/5/2017 24AB-ULg
Biofilter and LOKI allow for gene regions to beadjusted by the linkage disequilibrium (LD) patterns in agiven population.
When comparing a known gene region to any otherregion or position (such as CNVs or SNPs), areas in highLD with a gene can be considered part of the gene, evenif the region lies outside of the gene’s canonicalboundaries.
This step require use of additional tool
LD Profiles : GWAS information
12/5/2017 25AB-ULg
Biofilter can be run from a command-line terminal by executing
biofilter.py or python biofilter.py
All options can either be provided directly on the command line
biofilter.py --option-name
configuration files could be given as input such as
biofilter.py analysis.config
Biofilter : Command lines vs Configuration
12/5/2017 26AB-ULg
Biofilter : Configuration file
▪biofilter.py test.config
12/5/2017 27AB-ULg
Options on the command line are lower-case, start with twodashes and may contain single dashes to separate words (such as “--snp-file”),
while in a configuration file the same option would be in upper-case, contain no dashes and instead use underscores to separatewords (i.e. “SNP_FILE”).
Many commandline options also have alternative shorthand versions of one or a fewletters, such as “-s” for “--snp-file” and “--aag” for “--allow-ambiguous-genes”.
Biofilter : Command lines vs Configuration
12/5/2017 28AB-ULg
--help / HELPDisplays the program usage and immediately exits.
--version / VERSIONDisplays the software versions and immediately exits. Note thatBiofilter is built upon LOKI and SQLite, each of which will alsoreport their own software versions.
--report-configuration / REPORT_CONFIGURATION
Argument: [yes/no] Default: noGenerates a Biofilter configuration file which specifies the currenteffective value of all program options, including any defaultoptions which were not overridden.
Configuration Options
12/5/2017 29AB-ULg
--knowledge / KNOWLEDGEArgument: <file> Default: none
--report-genome-build / REPORT_GENOME_BUILDArgument: [yes/no] Default: yes
--report-gene-name-stats / REPORT_GENE_NAME_STATSArgument: [yes/no] Default: no
--report-group-name-stats / REPORT_GROUP_NAME_STATSArgument: [yes/no] Default: no
--allow-unvalidated-snp-positions / ALLOW_UNVALIDATED_SNP_POSITIONS Argument: [yes/no] Default: yes
--allow-ambiguous-snps / ALLOW_AMBIGUOUS_SNPS
Prior Knowledge Options
12/5/2017 30AB-ULg
--snp / SNPArguments: <snp> [snp] […] Default: none
--snp-file / SNP_FILEArguments: <file> [file] […] Default: none
--position / POSITIONArguments: <position> [position] […] Default: none
--position-file / POSITION_FILEArguments: <file> [file] […] Default: none
-region / REGIONArguments: <region> [region] […] Default: none
--region-file / REGION_FILEArguments: <file> [file] […] Default: none
Primary Input Data Options
12/5/2017 31AB-ULg
--filter / FILTERArgument: <type> [type] […] Default: none
Perform a filtering analysis which outputs the specified type
--annotate / ANNOTATEArgument: <type> [type] […] [:] <type> [type] […] Default: none
--model / MODELArgument: <type> [type] […] [:] [type] […] Default: none
Output Options : Mode of analysis
12/5/2017 32AB-ULg
input1 #snprs11rs12rs13rs14rs15rs16
KNOWLEDGE test.dbSNP_FILE input1GENE_FILE input2FILTER snp
Test.config
input2#geneACE
run “ biofilter.py Test.config”What is expected output ??? Can you make inference by lookingfigure
Filter mode : search SNPs that correspond to a list of genes
12/5/2017 33AB-ULg
Annotation mode : a SNP with gene region information
KNOWLEDGE test.dbSNP rs11 rs24 rs99ANNOTATE snp region
Test.config
Biofilter.py test.config
Output
12/5/2017 34AB-ULg
Step 1Map the input list of SNPs to genes within Biofilter.
Step 2Connect, pairwise, the genes that contain SNPs in the input list ofSNPs.
Step 3Break down the gene-gene models into all pairwise combinationsof SNPs across the genes within sources
Pair wise Gene-Gene and SNP-SNP interaction
12/5/2017 35AB-ULg
we will use all of the SNPs on the first chromosome.Test.config
KNOWLEDGE test.dbSNP 11 12 13 14 15 16 17 18 19FILTER gene
Step 1 : Pair wise Gene-Gene and SNP-SNP interaction
Output:#geneABCDE
12/5/2017 36AB-ULg
KNOWLEDGE test.dbGENE A B C D EMODEL gene
Test.config
#gene1 #gene2 scoreA C 2-3
biofilter.py test.config
output
Step 2 : Connect, pairwise, the genes that contain SNPs in the input list of SNPs.
12/5/2017 37AB-ULg
Step 3 : Break down the gene-gene models into all pairwise combinations of SNPs
biofilter.py test.config
12/5/2017 38AB-ULg
12/5/2017 AB-ULg 39