the wold lab biohub cory tobin. collaborators brandon king joe roden diane trout dr. barbara

17
The Wold Lab BioHub Cory Tobin

Post on 20-Dec-2015

218 views

Category:

Documents


4 download

TRANSCRIPT

The Wold Lab

BioHub

Cory Tobin

Collaborators

Brandon King

Joe Roden

Diane Trout

Dr. Barbara

Goal

• Standardize the relationship between biological data

• Integrate all of the data seamlessly

• Provide novel methods to search for and analyze data

Adapted from http://woldlab.caltech.edu/biohub/

My Contribution

Implement a database for homology data

Background

Species A

Species B

Paralogs

Orthologs

The more general term is “homology”

Gene Gene

Gene

Requirements

• Be more accurate and flexible than HomoloGene

• Work in real time

• Make sense of HomoloGene’s misleading data

Rationale

Gene

Gene

Gene

Gene

Gene

They are similar

Gene

Gene Gene

GeneGene

HomoloGene BioHub

They are related like this

Rationale Continued

Human Genome

Mouse Genome

Seq A Seq B

HomoloGene would BLAST seq A against mouse and determine that seq C is an ortholog of seq A.

Seq C

HomoloGene would also BLAST seq B against mouse and detrmine that seq C is an ortholog of seq B.

BioHub will BLAST seq A against mouse, find seq C, then BLAST C back against human to see if there are any better matches. It will find seq B to be better.

Methods

• Design data relationships that make sense biologically

• Generate the low-level database interaction code

• Parse and load HomoloGene’s data into our database

• Write biologically useful functions

• Create a web-based interface for easy use

Materials

• ArgoUML – Design Aid

• Pymerase – Design Implementation

• PostgreSQL – Database

• HomoloGene – Data Source

• Python – Programming Language

Current State

• Design data relationships that make sense biologically

• Generate the low-level database interaction code

• Parse and load HomoloGene’s data into our database

• Write biologically useful functions

• Create a web-based interface for easy use

Example Usage

Sequence of Interest

…GGATACAAAATTCCTC…

Are there any known genes in this sequence?

acetyl - coenzyme A

dehydrogenase ( Human )

(cont.)

acetyl - coenzyme A

dehydrogenase ( Human )

Are there any homologs?

Mouse

Rat

Mosquito

Fruit fly

Nematode

(cont.)

How are those genes related?

Where do you

want to go?

More Info

BioHub woldlab.caltech.edu / biohub

HomoloGene www.ncbi.nlm.nih.gov

Python python.org

Pymerase pymerase.sf.net

PostgreSQL postgresql.org