analysing african and european cattle with taverna 2.2
Post on 04-Jan-2016
22 Views
Preview:
DESCRIPTION
TRANSCRIPT
http://taverna.org.uk
Analysing African and European cattle with Taverna 2.2
Stuart Owen
Based on the work by:
Professor Andy Brass and Mohammad KhodadadiUniversity of Manchester, UK
Harry Noyes and Steve KempUniversity of Liverpool, UK
BOSC2010 – Boston.
http://taverna.org.uk
Analysing African and European cattle with Taverna 2.2A BioInformatics case study
demonstrating the use of the Taverna 2 workflow system
This is a snapshot of some exiting science which is currently in progress
http://taverna.org.uk
Analysing African and European cattle with Taverna 2.2
• 10,000 years separation• African Livestock adaptations:
• Hardier• Better disease resistance
• Potential outcomes: • Food security• Understanding resistance• Understanding environmental
Conditions• Drought• Parasites
• Understanding diversity
http://news.bbc.co.uk/1/hi/science_and_environment/10403254.stm
http://www.sciencemag.org/cgi/content/full/328/5986/1640
http://taverna.org.uk
Workflow and phases
MAP
FILTER
ANALYSIS
http://taverna.org.uk
Workflow and phases
Input SNP file
Populate DB with start SNP’s and resource version numbers
Lift-over: maps between UMD3 and BTA4 cow assemblies
Exon positions from ENSMBL
Find SNPs in Exon regions
PolyPhen to mark “dangerous” SNP’s
http://taverna.org.uk
Little more about the phases …
• Input SNP file result of 15 fold average coverage of an entire Boran cow
– 11.9 million SNP’s described.
– Resulting from Next Generation Sequencing.
• All initial data is stored within a Database, mapped by a runID to the versions of ENSEMBL, LiftOver, Polyphen.
• LiftOver – provides a mapping between 2 different reference cow assemblies –
– UMD3 : more accurate assembly
– BTA4 : better annotated and ENSEMBL friendly
– Store BT4 position, Chromosome and Allele in database
– Filter out, but store, results where there is a mismatch between the base.
http://taverna.org.uk
… Little more about the phases
• ESEMBL is used to retrieve annotations about the SNP’s : http://www.ensembl.org/
– For all the SNPs that have the same base we go over all the exons for cow in ENSEMBL and see if we can match the SNPs to any of these exons ( exon start < SNP position < exon end), also store geneID, Allele, associated Gene names, and Bio-Type.
– Filter out, but store, ENSEMBL/BTA4 mismatches.
– Second phase fetches the consequence according the the BTA4 positions.
– From this information a file is generated for PolyPhen, for all SNPs that got non-synonymous as a consequence.
• A local instance of PolyPhen is queried using a file generated from the ENSEMBL annotations to produce an indication of the level to which a SNP changes the protein.
• Outcome is an Annotated Database of ~20,000 “interesting” SNPs
http://taverna.org.uk
Packaged as a sharable virtual machine image
11.9 Million SNPs
LiftOverLiftOver
ResultsPolyPhen
50,000 annotatedSNPs
ENSEMBL
11.9 Million SNPs
LiftOverLiftOver
ResultsPolyPhen
20,000 annotatedSNPs + provenance.
ENSEMBL
http://taverna.org.uk
Packaged as a sharable virtual machine image
• LiftOver, Taverna, PolyPhen and the Workflow is packaged as a Virtual Machine image.
– Everything (except ENSEMBL) is run locally
– Full Cow analysis takes 2 days – previous attempts would have taken an estimated 3 months for the PolyPhen phase alone.
• Results and experiment can be distributed and shared as a complete package
– Re-use
– Repeatable
– Reproducible
• Future plans to deploy the image on “The Cloud”
http://taverna.org.uk
Packaged as a sharable virtual machine image
ENSEMBL
Boran Cow Annotated DB
MAP
FILTER
ANALYSIS
FILTER
ANALYSIS
MAP
FILTER
ANALYSISSheko Cow
N’Dama Cow
Etc …
http://taverna.org.uk
Highlights of new Taverna 2.2 features
• Officially released last Wednesday – July 7th 2010
• Loading and sharing of service sets
• Ability to load and edit workflows that contain services that are offline
• Reporting on the state of the workflow
• Tabular representation of a workflow run
• Retrying and parallelization of service calls
• Consistent representation of the intermediate and workflow results
• Pause/resume/cancel of a running workflow
• Command line tool that allows you to execute workflows outside of the workbench.
• Faster, Better, Easier
top related