disease gene prioritization

Download Disease  Gene Prioritization

Post on 23-Feb-2016

33 views

Category:

Documents

0 download

Embed Size (px)

DESCRIPTION

Disease Gene Prioritization . Presented by Qian Huang. What is a disease. Disease is a condition of the living animal or plant or one of its parts that impairs normal function and is typically manifested by distinguishing signs and symptoms - PowerPoint PPT Presentation

TRANSCRIPT

Disease Gene Prioritization

Disease Gene Prioritization Presented by Qian HuangWhat is a diseaseDisease is a condition of the living animal or plant or one of its parts that impairs normal function and is typically manifested by distinguishing signs and symptomsDefinition also describes the malfunction of individual cells or cell groupsMany diseases should be defined on a cellular level. Sickle cell disease was first documented in 1904sickle cell disease became the first disease to be characterized on a molecular level in 1949The first genetic diseases was discovered

Genetic diseasesA genetic disease is any disease that is caused by an abnormality in an individual's genomeIt is rarely that one gene is responsible for one function. An assembly of genes constitutes a functional module or a molecular pathway. a molecular pathway leads to some specific end point in cellular functionality via a series of interactions between molecules in the cell. Any changes in the normally molecular interactions and pathways may lead to disease The specifics of a change determine the severity and the type of the resulting disease

Genetic diseasesinherited from the parents or caused by mutationsa number of different types of genetic diseasesSingle gene disorder - Mendelian or monogenetic inheritance - caused by changes or mutations that occur in the DNA sequence of a single gene - over 4000 human diseases caused by single gene disorder -occur in about 1 out of every 200 births - dominant: Only one mutated copy of the gene will be necessary for a person to be affected. one affected parent, 50% chance - recessive: Two copies of the gene must be mutated for a person to be affected. Two unaffected people each carry one copy of the mutated gene, 25% chance the child affectedGenetic diseasesChromosome abnormalities - distinct structures made up of DNA and protein - caused by abnormalities in chromosome number or structure - due to a problem with cell divisionMultifactorial gene disorder - caused by a combination of environmental factors and mutations in multiple genes - heart disease, high blood pressure, cancer, diabetes, obesity

Genetic diseasesIdentifying the relationship between human genetic diseases and their causal genes is important in human medical improvement Revealing the genetic basis of human disease is a fundamental aim of the human genetic studies The Human Genomic Project started in 1990 The genomic studies rapid accumulate large amount of genomic data a lot of computational methods were proposed to prioritize candidate casual genes by considering the relationship of candidate genes of a given phenotype and existing known disease genesWhat is Gene Prioritization Gene prioritization is the process of assigning likelihood of gene involvement in generating a disease phenotype.narrows down, and arranges the set of genes to be tested experimentally. based on various correlative evidence that associate each gene with the given disease and suggest possible causal links Evidence comes from high-throughput experimentation, including gene expression and function, pathway involvement, and mutation effects

Why using Gene Prioritization Proving a causal link between a gene and a disease experimentally is expensive and time-consuming Using computational prioritization of candidate genes prior to experimental testing can drastically reduce the associated costs and improve the outcomes of targeted experimental studies High-throughput experimental techniques has contributed significantly to the identification of disease-associated genes and mutations and reported a large number of dataGene prioritization is a computational method to deal with the quantity of data, effectively translate the experimental data into legible disease-gene associations

Identification of disease-genes Disease results from the changes of normal functionFour reasons of pathway function changes (1) changes in gene expression (2) changes in structure of the gene-product (3) introduction of new pathway members (4) environmental disruptionsDefining molecular pathways whose disrupted functionality is necessary and sufficient to cause the disease All members of the affected pathways can be construed as disease genes Identification of disease-genes is difficult

How to identify disease-genesDisease genes are most often identified using: (1) genome wide association or linkage analysis studies (2) similarity or linkage to co-expression with known disease genes (3) participation in known disease-associated pathways or compartments. Methods represented by direct and indirect evidence Direct : evidence coming from own experimental work and from literature Indirect: genes that are in any way related to already established disease-associated genes

Indirect evidence Very broadly, gene-disease associations are inferred from evidence of five aspects(1) Functional Evidence The suspect gene is a member of the same molecular pathways as other disease-genes (2) Cross-species Evidence The suspect gene has homologues implicated in generating similar phenotypes in other organisms

Indirect evidence (3) Same-compartment EvidenceThe suspect gene is active in disease-associated pathways (e.g. ion channels), cellular compartments (e.g. cell membrane), and tissues (e.g. Liver) (4) Mutation EvidenceThe suspect genes are affected by functionally deleterious mutations in genomes (5) Text EvidenceThere is ample co-occurrence of gene and disease terms in scientific texts

Overview of gene prioritization data flow

Molecular Interactions Many gene prioritization tools used gene-gene (protein- protein) interaction and pathway information to prioritize candidate genes. genes responsible for similar diseases often participate in the same interaction networks MC4R is a receptor and known to be associated with severe obesity The interactors of MC4R may be predicted to be linked to obesity. AgRP and POMC directly bind MC4R for varied purposes of the MC4R pathway. mutations that negatively affect normal POMC production or processing have been shown to be obesity- associated AgRP have been linked to food intake abnormalities

MC4R-centered protein-protein interaction network

Regulatory and genetic linkage Co-regulation of genes has traditionally been thought to point to same molecular pathways and similar disease Co-expressed genes often cluster together in different species lead to genetic linkage Genes co-expressed with or genetically linked to other disease genes are also likely to be disease-associated They also pose a problem

Problem with Regulatory and genetic linkage A given disease-associated gene may be co-regulated with or linked to another disease-associated gene The two diseases are not identical It is difficult to distinguish the actual causes of disease and co-occurring with the disease-mutations due to genetic linkage.

Similar sequence/structure/ functionPrioritization tools often use functional similarity as an input feature Predictors relying on functional similarity to determine disease association will link two genes sharing a same functionFunctionally similar genes are likely to produce similar disease phenotypes, sequence/structure similarities are indicators of similar disease involvement Disease genes are often associated with specific gene and protein features higher exon number longer gene length

Cross-species Evidence Cross-species comparisons of orthologues and their associated phenotype Finding related phenotypes across species suggests orthologous human candidate genes MC4R is known to be associated with severe obesityPolar bears have a V95I mutation on MC4R for their need to increase body fat to adapt to their environment may have a similar (increased body fat) effect in humans

Cross-species Evidence A correlation of gene co-expression across species is also useful for gene prioritization Genes that are part of the same functional module are generally co-expressed functionally unrelated genes also could be co-expressed Comparing genes co-expressed in human and other organisms can be used to infer disease-genesA cluster of functionally unrelated genes co-expressed in human and mouse contained a disease-gene KCNIP4 The initial list of 1,762 genes mapped to 850 OMIM (Online Mendelian Inheritance in Man)phenotypes narrow to twenty times fewer possible disease-causing genes.

Compartment Evidence Changes in gene expression in disease-affected compartments and tissues are associated with many complex diseases Predict suspect gene in the disease-associated pathway, compartments and tissues.multiple storage diseases all are caused by the impairment of the degradation pathways of the intracellular transport.

Mutant Evidence Every genetic disease is associated with some sort of mutation that alters normal functionality Selection of candidate genes for further analysis is often based on mutations in diseased individuals not all observed mutations are associated with deleterious effects: (1) no effect at all - silent mutations (2) some is deleterious with respect to normal function (3) weakly beneficial strongly deleterious mutation are relatively rare because they are rapidly removed by selectionA candidate gene carrying a deleterious mutation is more likely to be disease-associated than gene with other mutation or no mutation at all

Mutant Evidence Structural variation (SV) insertions and deletions, inversions, translocations..

Nucleotide polymorphisms SNPs (single nucleotide polymorphisms) MNPs (multi-nucleotide polymorphisms

90% of human variation exists in the form of Nucleotide polymorphisms

Structural variation Structural variation (SV) is the least studied of all types of mutations less than 10% of human geneti

Recommended

View more >