biomart update

53
BioMart 2007 Arek Kasprzyk European Bioinformatics Institute BOSC Vienna, July 2007

Upload: bosc

Post on 27-May-2015

1.035 views

Category:

Technology


0 download

DESCRIPTION

Title: Biomart 2007Author: Arek Kasprzyk

TRANSCRIPT

Page 1: Biomart Update

BioMart 2007

Arek KasprzykEuropean Bioinformatics InstituteBOSC Vienna, July 2007

Page 2: Biomart Update

Data Flow

Mart

JAVA

PERL

Source data

DAS

WebGUI

Commandline

Desktop GUI

WebService

Page 3: Biomart Update

Data Flow

JAVA

PERLMartDAS

WebGUI

Commandline

Desktop GUI

WebService

Page 4: Biomart Update

Admin Tools

Page 5: Biomart Update

Recent developments (0.4- 0.6)

• MartBuilder

• MartView

• Web services

• API

• DAS

• Central Server

• More deployers

Page 6: Biomart Update

Data Flow

Mart

JAVA

PERL

Source data

DAS

WebGUI

Commandline

Desktop GUI

WebService

Page 7: Biomart Update

MartBuilder

Page 8: Biomart Update

MartBuilder

Page 9: Biomart Update

MartBuilder

Page 10: Biomart Update

MartView

Page 11: Biomart Update

APImy $initializer = BioMart::Initializer->new('registryFile'=>$confFile);my $registry = $initializer->getRegistry;my $query = BioMart::Query->new('registry'=>$registry,'virtualSchemaName'=>’central_server_1');

$query->setDataset("hsapiens_gene_ensembl"); $query->addFilter("chromosome_name", [”1"]);

$query->addAttribute("ensembl_gene_id"); $query->addAttribute("ensembl_transcript_id"); $query->addAttribute(”ensembl_peptide_id"); $query->setDataset(“msd”); $query->addFilter(”experiment_type", [”NMR"]); $query->addAttribute("pdb_id"); $query->addAttribute(”resolution");

$query->addAttribute(”release_date"); $query->addAttribute(”header");

my $query_runner = BioMart::QueryRunner->new(); $query_runner->execute($query); $query_runner->printResults();

Page 12: Biomart Update

Web service

<Query virtualSchemaName="central_server_1">

<Dataset name="hsapiens_gene_ensembl" > <Filter name="chromosome_name" value="1"/><Attribute name="ensembl_gene_id"/><Attribute name="ensembl_transcript_id"/><Attribute name="ensembl_peptide_id"/> </Dataset>

<Dataset name="msd"> <Filter name="experiment_type" value=”NMR"/><Attribute name="pdb_id"/><Attribute name=”resolution"/><Attribute name=”release_date"/><Attribute name=”header"/>

</Dataset></Query>

Page 13: Biomart Update

MartService

• Meta data(GET)– Marts

– Datasets

– Configuration

• Queries (POST)

Page 14: Biomart Update

Meta data

http://www.mycompany.com/mypath/martservice?

• Martstype=registry

• Datasetstype=datasets&mart=mymart

• Configurationtype=configuration&dataset=mydataset

Page 15: Biomart Update

Query

<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE Query><Query virtualSchemaName = "default" count = "" softwareVersion = "0.5" > <Dataset name="hsapiens_gene_ensembl" >

<Attribute name="ensembl_gene_id"/><Attribute name="ensembl_transcript_id"/><Filter name="chromosome_name" value="1"/><Filter name="band_end" value=”p36.33"/><Filter name="band_start" value=”q44"/>

</Dataset>

<Dataset name="msd"><Attribute name="pdb_id"/><Attribute name=”experiment_type"/><Filter name="experiment_type" value=”NMR"/>

</Dataset></Query>

wget -q 'http://www.biomart.org/biomart/martservice?query=

-O 5utr.dat

Page 16: Biomart Update

Results

• Ordered according to 1. Datasets

2. Attributes

• Default Format TSV– Can be altered by specifying a formatter

Page 17: Biomart Update

Genomic data

Page 18: Biomart Update

Uniprot, MSD, ArrayExpress

Page 19: Biomart Update

Model organism databases

Page 20: Biomart Update

Developmental models

Page 21: Biomart Update

Proteomics

Page 22: Biomart Update

Name Fragment Position Alleles strand

SNP1 AL139258 1659852 T/A 1

SNP2 NT_25698 2569873 C/T -1

SNP3 chr13 1125698 C/G 1

Data conversion and integration

Ensembl

HapMap

NCBI

UCSC

Priopriatery data

Diabetes-Gene Association DataBase

Combined proprietary and

public data

Genetics of Infectious and Autoimmune Diseases, Pasteur Institute, INSERM U730, Paris, France.

Target SNP selection for the study of

type 1 diabetes (T1D), malaria and dengue

Page 23: Biomart Update

CAPRISA understanding HIV pathogenesis and epidemiology as

well as HIV/AIDS treatment and prevention

Clinical Data

MID

Cellular ImmunityHumoral Immunity HLA TypingSequence &

Sequence Related

Pipeline

Page 24: Biomart Update

Unilever

• Human study to evaluate Omics in assessing safety indicators

• Study of skin inflammation in response to detergent

• Skin samples taken and analyzed with multiple Omics techniques. – Blood– Skin biopsy– Microdialysis

Page 25: Biomart Update
Page 26: Biomart Update

1. Filter 2. Attributes

3. Results

Use Example 1 All genes in the human genome

up-regulated in Pancreatic Adenocarcinomas (PDACs) vs Normal Pancreas (ND))

Page 27: Biomart Update

1. Filter 2. Attributes

3. Results

Use Example 2 all upstream sequences for all genes on chromosome 1

up-regulated in Pancreatic Adenocarcinomas (PDACs) vs Normal Pancreas (ND))

Page 28: Biomart Update

1. Filter2. Attributes

Use Example 3

3. Results

Just Finished my experiment and would like to get the overlaps between my results and those reported in previous studies !

Page 29: Biomart Update

Web service

Page 30: Biomart Update

Perl

Page 31: Biomart Update

DAS

Page 32: Biomart Update

Bioconductor package biomaRt

Page 33: Biomart Update

Galaxy

Page 34: Biomart Update

Taverna

Page 35: Biomart Update

Central Server (www.biomart.org)

Page 36: Biomart Update

www.biomart.org/biomart/martservice

Page 37: Biomart Update

Future plans

Page 38: Biomart Update

New configuration system

• Normalized– Based on a partition table concept– Unified pointer system -> relational attribute– Configuration merge - implicit federation– Write to the db

• Run time slice and dice of a registry object rather than combinatorial pre-compilation

Page 39: Biomart Update

New configuration system

• Scalability– Updates and maintenance of large configurations

– Run time server scalability (cache and memory)

– Scalable for multiple mart users (single instance - security)

– Scalable for alternative configurations (new MartGUI framework)

Page 40: Biomart Update

New MartGUI framework

• Components– Alternative DS Configurations– Alternative GUIs (MView, MQForm,MSForm etc)– Alternative Analyzers/Vizualizers (optional install)

• Extensible – Custom extensions to the components

• Common interface – Formatters, DAS, Analyzers, Visualizers– Importable/Exportable pair interface

Page 41: Biomart Update

New GUI framework

• Old ‘GUI unit’:– full registry+MartView+default formatters– Customization limited to colors and headers

• New ‘GUI unit’:- RegistrySlice+ MartGUI+Visualizer/Analyzer- Combine units into your unique functional

environment- Functional level customization

Page 42: Biomart Update

New GUI framework

Gene Id conversion

Functional annotation

Compare two gene lists

Analyze gene list Draw distribution

Full search

Draw bla bla chart

Home

Welcome to my data mining website

SITE HEADER

Page 43: Biomart Update

New GUI framework

Gene Id conversion

Functional annotation

Compare two gene lists

Analyze gene list Genbank

TremblUniprot

Submit

Draw distribution

Full search

paste your ids here

Draw bla bla chart

Hugo

Home

SITE HEADER

Page 44: Biomart Update

New GUI framework

Home Gene Id converterFu

Full search

Welcome to my data mining website

Page 45: Biomart Update

New GUI framework

Hugo Genebank

Uniprot Swissprot

Submit

paste your ids here

HomeFu

Full searchGene Id conversion

Page 46: Biomart Update

Cytogenetic distribution of pancreatic cancer genes satisfying my query (histogram)

Page 47: Biomart Update

Cytogenetic distribution of pancreatic cancer genes satisfying my query (ideogram)

Page 48: Biomart Update

Cytogenetic distribution of chromosomal aberrations in pancreatic cancer

Page 49: Biomart Update
Page 50: Biomart Update

New GUI framework

Page 51: Biomart Update

New GUI framework

Page 52: Biomart Update

New configuration tool

• MartConfigurator– Handles a complete registry object

– Defines GUI units

– Automated service discovery

– Manual link override

– Automated updates for large configurations

– Improved user interaction

Page 53: Biomart Update

Credits• Martians

– Syed Haider

– Richard Holland

– Damian Smedley

• Contributors– Steffen Durinck (NCI, NIH)

– Eric Just (Northwestern University)

– Don Gilbert (Indiana University)

– Darin London (Duke University)

– Will Spooner (CSHL)

– Gudmundur Thorisson (CSHL)

– Benoit Ballester (Universite de la Mediterranee)

– James Smith (Ensembl)

– Arne Stabenau (Ensembl)

– Andreas Kahari (Ensembl)

– Craig Melsopp (Ensembl)

– Katerina Tzouvara (EBI)

– Paul Donlon (Unilever)