the bioinforma cs core facility in oslo - wiki.uio.no · rc for the period 2013-‐2017 (2015) na...
TRANSCRIPT
h�p://core.rr-‐research.no/ bioinforma�cs
The Bioinforma�cs Core Facility in Oslo
Staff:
Core facility leader Eivind Hovig
h�p://core.rr-‐research.no/ bioinforma�cs
The biological data explosion The greatest challenge facing the molecular biology community today is to make sense of the wealth of data that has been produced by the genome sequencing projects – EMBL-‐EBI
Bioinforma�cs algorithms and compu�ng power are the main bo�lenecks for analyzing huge amount of data generated by the current technologies (Gálvez et al., Bioinforma�cs 26, 683 (2010)).
h�p://core.rr-‐research.no/ bioinforma�cs
The biological data explosion
h�p://www.genome.gov/sequencingcosts
Storage and processor capacity roughly double every two years (Moore’s Law) New biological data is at present doubling every five months – rate is increasing New sequencing machines produce billions of bases per experiment Both users and informa�cs infrastructure have trouble adap�ng
Very real and serious challenge for molecular biology, in Norway and elsewhere
h�p://core.rr-‐research.no/ bioinforma�cs
The advantages of having a bioinforma�cs core facility
Providing infrastructure Broad, stable and increased competence Cost efficient for large-‐scale projects Help for projects of all sizes Increase probability of a grant award We can guide you to the correct people
h�p://core.rr-‐research.no/ bioinforma�cs
Who are we?
Part of ELIXIR Norway – The Na�onal Technology Pla�orm for Bioinforma�cs
Tromsø: Nils Peder
Willassen
Bergen: Inge Jonassen
Trondheim: Finn Drabløs
Oslo: Eivind Hovig
Ås: Dag Inge Våge
h�p://core.rr-‐research.no/ bioinforma�cs
ELIXIR Norway Builds on the infrastructure and exper�se built up by the FUGE bioinforma�cs pla�orm (Funded by Research Council of Norway 2003-‐2012)
Elixir Norway funded by the RC for the period 2013-‐2017 (2015) Na�onal centre with nodes in Bergen, Oslo, Ås, Trondheim, and Tromsø Offers state of the art research based infrastructure and services to Norwegian users in academia, industry, and government – Build and offer an e-‐infrastructure for users within molecular life science – Work �ghtly with other technology pla�orms – Provide state of the art bioinforma�cs support – Ensure that Norwegian data are stored in standardized formats –
suppor�ng re-‐use of data – Work with the generic e-‐infrastructure providers to make their resources
available towards bioinforma�cs Headed by Professor Inge Jonassen, Bergen
h�p://core.rr-‐research.no/ bioinforma�cs
ELIXIR The purpose of ELIXIR is to construct and operate a sustainable infrastructure for biological informa�on in Europe to support life science research and its transla�on to medicine and the environment, the bio-‐industries and society
Challenges for life scien�sts in Norway and all European countries – maintain open access to biological data to enhance compe��veness and
innova�on (free access to databases) – manage the data deluge – integrate the data to reduce fragmenta�on of effort and research – exploit new types of data
These challenges are too vast for any single ins�tu�on or country, and must therefore be managed by joining forces at European and global levels
h�p://core.rr-‐research.no/ bioinforma�cs
ELIXIR
Crosswell and Thornton, Trends Biotechnol. 30, 241 (2012)
h�p://core.rr-‐research.no/ bioinforma�cs
The Bioinforma�cs Core Facility in Oslo is a part of ELIXIR Norway
We are funded by RCN – ELIXIR Norway Helse Sør-‐Øst User fees (Large projects only)
h�p://core.rr-‐research.no/ bioinforma�cs
Who are we in Oslo? Vegard Nygaard – Microarray and sequence data analysis
Morten Johansen – Programming, scrip�ng, web servers
Marit Holden, Norsk Regnesentral – Sta�s�cal genomics
Clara-‐Cecilie Günther, Norsk Regnesentral – Sta�s�cal genomics
Eivind Hovig -‐ manager
Ståle Nygård – Sta�s�cal genomics Oslo and Na�onal Helpdesk manager
Jon K. Lærdahl – Protein structure analysis
Merete Molton Worren – High-‐throughput sequencing analysis
Daniel Vodák – HTS analysis
Torbjørn Rognes – sequence analysis
Sveinung Gundersen – Hyperbrowser Roughly 6 man-‐years of
bioinforma�cs exper�se
h�p://core.rr-‐research.no/ bioinforma�cs
What do we do?
Providing Service and Support Access to Compu�ng Facili�es Access to Storage Organize and Contribute to Courses Organize and Contribute to Conferences and Workshops
h�p://core.rr-‐research.no/ bioinforma�cs
What is a bioinforma�cian? An in�mate knowledge of UNIX-‐based opera�ng systems Fluent in scrip�ng languages such as Perl or Python Understanding programming languages such as C++/ C / Java, and ability to develop so�ware encapsula�ng new analysis methods Knowledge of network-‐based data storage Understanding of rela�onal databases and database architecture
h�p://core.rr-‐research.no/ bioinforma�cs
What is a bioinforma�cian?
Skills in sta�s�cal analysis Understanding of experimental design Knowledge of mathema�cal modeling
And of course, we have a general knowledge of molecular biology and genomics
h�p://core.rr-‐research.no/ bioinforma�cs
What can a bioinforma�cian do for you? Provide the computa�onal/sta�s�cal analysis of data derived from your biological/medical experiments – It is always be�er to be involved early in the project, before the experiments have been performed Contribute to planning and experimental setup Not: Here are the results/data, tell us what they mean...!
Our knowledge and experience makes us well equipped to seek out and test novel bioinforma�cs so�ware and analysis pipelines
We can do the programming/scrip�ng for you if you wish to do the main part of the analysis yourself
h�p://core.rr-‐research.no/ bioinforma�cs
Bioinforma�cs Core Facility Helpdesk High quality research based bioinforma�cs support Part of the Na�onal ELIXIR Norway helpdesk Headed by Ståle Nygård in Oslo Common point of access: [email protected] Or contact the experts directly – see our webpages Have supported 100s of life science research projects the last decade FUGE (2003-‐12) and Helse Sør-‐Øst funding
Now exper�se from the FUGE period supplemented with new ELIXIR personnel ELIXIR (2013-‐17) and Helse Sør-‐Øst funding
h�p://core.rr-‐research.no/ bioinforma�cs
Bioinforma�cs Core Facility Helpdesk Common point of access: [email protected] Or contact the experts directly – see our web pages Services within all major areas in bioinforma�cs analysis, e.g.
– Sequence analysis – Analysis of High-‐throughput sequencing data – Protein structure analysis – Analysis of DNA varia�on – Gene�c linkage studies – Microarrays – General gene associa�on studies – Sta�s�cal genomics – Database construc�on and access – Web services
Arranging courses and workshops
Located both at the Norwegian Radium Hospital, Montebello, and at the top floor of Ole-‐Johan Dahls hus, the new Informa�cs building at UiO
Close collabora�on with other Core Facili�es, e.g. Genomics Core Facility (GCF, Radium Hospital) and the Oslo Genotyping Core Facility
h�p://core.rr-‐research.no/ bioinforma�cs
Services we provide -‐ examples
h�p://core.rr-‐research.no/ bioinforma�cs
Services we provide Analysis of high-‐throughput sequencing data
– RNA-‐seq: expression analysis – ChIP-‐seq: finding protein-‐DNA interac�ons – Genome sequencing: finding SNPs, indels, structural variants
– Bisulfite: DNA methyla�on – small-‐RNA-‐seq: Detec�ng novel noncoding RNAs, expression
– Metagenomics: Calcula�ng opera�onal taxonomic units (OTUs), crea�ng phylogene�c trees, assigning species
h�p://core.rr-‐research.no/ bioinforma�cs
Services we provide I want to visualize my data, and see where on the reference genome the (high-‐throughput) sequencing reads can be aligned I want to see which genes are affected, and how I want as much as possible out of my data
h�p://core.rr-‐research.no/ bioinforma�cs
Services we provide -‐ examples Discovery of germline/soma-‐�c variants Detec�on of structural aberra�ons Variant annota�on
h�p://core.rr-‐research.no/ bioinforma�cs
Services we provide -‐ examples Differen�al gene/transcript expression analysis
Discovery of differen�al splicing
Iden�fica�on of novel transcripts
h�p://core.rr-‐research.no/ bioinforma�cs
Services we provide -‐ examples Finding miRNAs expressed in colorectal cancer cells
Tes�ng the effects of different normaliza�on methods on the data
Checking for differen�al expression between groups
PLoS ONE 8, e66165 (2013)
h�p://core.rr-‐research.no/ bioinforma�cs
Services we provide -‐ examples Detec�ng miRNAs in the Atlan�c salmon Novel and known expressed miRNAs
Differen�al expression analysis between different �ssues, and between individuals infected with ISA and healthy controls
BMC Genomics 14, 482 (2013)
h�p://core.rr-‐research.no/ bioinforma�cs
Services we provide -‐ examples
Annota�on and analysis of the recently assembled cod genome
– Cross-‐species comparison and analysis of data – Using sequence alignment tools and annota�on
databases to iden�fy syntenic regions across species – Using synteny informa�on to infer reliability of missing
genes/regions in newly mapped genomes
Nature 477, 207 (2011)
h�p://core.rr-‐research.no/ bioinforma�cs
Services we provide – The Genomic HyperBrowser
If you have a genomic track, we can analyse it! – Research based service – Genome-‐scale analysis made robust and easy – Analyze your own data, and get custom-‐developed methodology and
tools – Comes with >100 000 genomic tracks, 76 analyses, 42 tools – From quick analysis to robust sta�s�cal tes�ng, in simple web
interface
h�p://core.rr-‐research.no/ bioinforma�cs
The Genomic HyperBrowser Start in your internet browser
– A public, web-‐based interface allows sophis�cated genome analysis by simple point-‐and-‐click
If you have ques�ons, just ask – If you need input, we can assist you directly from where you le� off
If you need even more – We have a team of programmers and sta�s�cians that can develop novel so�ware and methodology
Nucleic Acids Res 41, W133 (2013)
h�p://core.rr-‐research.no/ bioinforma�cs
The Genomic HyperBrowser -‐ examples
Regula�on by C-‐myb transcrip�on factor – Predict binding based on mo�fs and ENCODE data
Promoters with methyla�on pa�ern – Sta�s�cal evalua�on of associa�on with CpG islands
Pathology of mul�ple sclerosis – Determine involved cell types from ENCODE data
Map environment to genome – Vitamin D associa�on to disease through receptor
Hum Mol Genet 21, 3575 (2012)
PLoS One 7, e32281 (2012)
h�p://core.rr-‐research.no/ bioinforma�cs
Services we provide Analysis of gene sets
– Gene-‐set enrichment analysis – Finding sta�s�cally overrepresented GO-‐terms and pathways
– Finding related genes based on literature – Grouping together correlated genes – Mapping gene-‐lists to pathways
h�p://core.rr-‐research.no/ bioinforma�cs
Services we provide Protein structure/sequence/func�on analysis
– Inves�gate splice variants, possible pos�ransla�onal modifica�ons, signal pep�des, and localiza�on signals by using state of the art bioinforma�cs tools
– Iden�fy protein domains, disordered regions and predic�on of secondary structure
– If possible, build 3D models – Iden�fy orthologs from other species – Phylogeny – Iden�fy func�onally important domains/segments/
residues – Predict effects of muta�ons or suggest residues for
site-‐directed mutagenesis
h�p://core.rr-‐research.no/ bioinforma�cs
Services we provide -‐ examples Discovered a new, 5th structural superfamily of DNA glycosylase repair enzymes
Modelled the 3D structure – homology modelling
Located ac�ve site and suggested residues for site-‐directed mutagenesis
Experimental follow-‐up
HhH
H2TH
UDG AAG
Family 5 – HEAT like repeats
Nucleic Acids Res. 35, 2451 (2007)
h�p://core.rr-‐research.no/ bioinforma�cs
Services we provide -‐ examples
Generated 3D model of human bile acid receptor TGR5 (a G-‐protein coupled receptor)
Predicted effect of muta�ons/SNPs
Experimental follow-‐up
PLOS One 5: e12403 (2010)
h�p://core.rr-‐research.no/ bioinforma�cs
Services we provide -‐ examples
Pidoux & Taskén, J. Mol. Endocrinol 44, 271 (2010) (not our work!)
Evolu�on/phylogeny of PKA cataly�c subunits Cα, Cβ, and Cγ K. Søberg, T. Jahnsen, T. Rognes, B.S. Skålhegg & J.K. Lærdahl, Plos ONE 8, e60935 (2013)
h�p://core.rr-‐research.no/ bioinforma�cs
Services we provide Sta�s�cal genomics
– Experimental design – Pa�ern discovery – Clustering – Mul�variate regression – Hierarchical inference – Mul�ple tes�ng – Event history analysis
Survival analysis – Other types of analysis upon request
h�p://core.rr-‐research.no/ bioinforma�cs
Services we provide -‐ examples Integra�ve analysis of gene dosage, expression and ontology (GO) data
Iden�fied driver genes in the carcinogenesis and chemoradioresistance of cervical cancer
Found overrepresented biological processes including apoptosis and metabolism
PLoS Genet. 5, e1000719 (2009)
h�p://core.rr-‐research.no/ bioinforma�cs
Services we provide -‐ examples Genome-‐wide gene expression analysis of bone biopsies from pa�ents with postmenopausal osteoporosis and healthy controls, adjus�ng for age and BMI
Found 256 transcripts confirmed for disease
J Bone Miner. Res. 26, 1793 (2011)
h�p://core.rr-‐research.no/ bioinforma�cs
Services we provide DNA and protein microarray analysis – Design of experiment – Image analysis – Copy number analysis – Genotyping analysis – Differen�al analysis – Classifica�on and clustering – Survival analysis – Meta analysis – Quality control – Public repository submission
h�p://core.rr-‐research.no/ bioinforma�cs
Services we provide Database and web services
– Database access – Provide access to local copies of selected databases – Provide snapshot access to pa�ent data in defined formats
Web services – Provide access to local web services – Maintain pointers to useful other web services
Scrip�ng and so�ware services – Support on scrip�ng/programming – Support on use of Linux/Unix tools – Help with setup of required so�ware and web-‐interfaces
h�p://core.rr-‐research.no/ bioinforma�cs
Services we provide Facilita�ng use of computa�onal resources together with USIT – CPU: h�p://uio.no/hpc/abel – Disk: h�ps://storebioinfo.norstore.no
h�p://core.rr-‐research.no/ bioinforma�cs
Services we provide -‐ examples
End of sec�on
h�p://core.rr-‐research.no/ bioinforma�cs
Helpdesk – How it works Users contact us through e-‐mail or in person Simple services (“less than 3 days work”) offered free of charge to Norwegian academic/government users
Always free to ask ques�ons Larger projects may require user fees or some form of collabora�ve research, depending on prior agreement. Any user fees will be based on a non-‐profit model and should be considered reasonable
It is recommended to think about bioinforma�cs funding already in the planning stages of your research as this is an important and resource intensive step
h�p://core.rr-‐research.no/ bioinforma�cs
Bioinforma�cs Core Facility and ELIXIR Norway Helpdesk
Our job is to help you with your bioinforma�cs needs. Please use us!
h�p://core.rr-‐research.no/ bioinforma�cs
Bioinforma�cs Core Facility and ELIXIR Norway Helpdesk
A�er this course: Don’t go home to your lab and try to do everything yourself!
COLLABORATE!!
h�p://core.rr-‐research.no/ bioinforma�cs
ELIXIR Norway Trondheim node – Integrated with the Bioinforma�cs Core Facility
Located in the Laboratory Centre of St Olavs Hospital and NTNU Focus on biomedical research, gene regula�on and large scale genomics Helpdesk services by Bioinforma�cs Core Facility (BioCore, funded by NTNU) Close collabora�on with Genomics Core Facility (GCF)
Finn Drabløs Professor Leader ELIXIR-‐NTNU
Pål Sætrom Professor Head BioCore
Morten Rye Researcher ELIXIR-‐NTNU
Jostein Johansen Senior Engineer Manager BioCore
Kje�l Klepper Staff Engineer ELIXIR-‐NTNU
http://www.motiflab.org
h�p://core.rr-‐research.no/ bioinforma�cs
ELIXIR Norway Tromsø node -‐ SYSBIO Located both at the Science Park and NT faculty, UiT
Collabora�on between Dept. of Chemistry and Dept. of Informa�cs, UiT
Focus on marine genomics/metagenomics
Contact persons; Erik Hjerde (Help desk) Nils Peder Willassen (Head)
From le� to right: Erik Hjerde, Tim Kalkhe, Nils Peder Willassen, Edvard Pedersen, Peik Haugen, Espen Robertsen and Said Ahmed. Lars Ailo Bongo not present
ELIXIR Norway -‐ Tromsø
h�p://core.rr-‐research.no/ bioinforma�cs ELIXIR Norway -‐ Ås
ELIXIR.NO Ås node.
Located at Centre for Integra�ve Gene�cs (CIGENE), Norwegian University of Life Sciences (UMB).
CIGENE has established pipelines for handling large-‐scale sequencing data, with focus on de novo assembly (e.g. the Atlan�c salmon genome) and development of DNA -‐ markers (SNPs). The research is aimed at bridging the gap between genotype and phenotype in produc�on biology species.
The primary contribu�on to ELIXIR Norway will be to make fish genomic resources and tools available to the na�onal and interna�onal research community.
Node manager: Dag Inge Våge
h�p://core.rr-‐research.no/ bioinforma�cs
ELIXIR.NO Bergen node – the Computa�onal Biology Unit
The University of Bergen coordinates the ELIXIR Norway project and the (aspiring) Norwegian ELIXIR Node
Bioinforma�cs at UiB organised in CBU – Computa�onal Biology Unit – including research groups and a service group
ELIXIR Norway personnel includes programmers, service scien�sts
Close coupling with LiceBase within Sea Lice Research Centre (SFI)
Collabora�on with Norwegian Genomics Consor�um (NGC), PROBE (proteomics), Department of publich health (biobanks)
Project leader: Inge Jonassen
ELIXIR Norway -‐ Bergen