genomes on rails

Post on 24-May-2015

3.717 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Originally given at RailsConf, this talk outlines how the Wellcome Trust Sanger Institute is using Ruby and Rails as part of their new sequencing platform.

TRANSCRIPT

Genomes on Railshas_many :sequences

Hello

➊ Previously

➋ Production

➌ Process

➊ Previously

The human genome

15 years to decode

3 billion letters

$3 billion

$3 billion ++

Race for the prize

Open data

Open source

Perl

Lots of Perl

Lots of Perl~4500 modules

Onwards!

40 species

Map evolutionaryspace

Compare genomes

Compare genomes

compare species

Compare genomes

compare species

compare individuals

More Perl~1500 modules

Quantum leap!

1000 personal genomes

1000 personal genomes

beyond 23andme

Hypertension

Diabetes

Coronary heart disease

Bipolar disorder

Malaria

➋ Production

Register projects

Register samples

Sample prep

Sequencing

Analysis

Change!

Flexible data capture

Virtual fields

Sample

Name

Organism

Concentration

class Sample < ActiveRecord::Base has_many :descriptors has_many :descriptor_valuesend

Key value pairs

Faster than you’d think

Change!

Sample

Name

Organism

Concentration

Sample

Name

Organism

Concentration

Origin

Quality metric

V1 V2

Rationalize!

Sample

Name

Organism

Concentration

Sample

Name

Organism

Concentration

Origin

Quality metric

V1 V2

Mapping!

Sample

Name

Organism

Concentration

Sample

Name

Species

Concentration

Origin

Quality metric

V1 V3

Origin

Pipeline management

Task 1 Task 2 Task 3

Workflow

Name

Operator

Instrument

Name

Serial number

Kit

Name

Passed

Throughput!

320Tb 450 CPU

320Tb 450 CPU Archive

75Tb

pilot study!

Multiple apps

Multiple instances

Loosely coupled

Loose coupling is hard

Deployment

Maintenance

Monitoring

Hard to maintain separation

Support novel science

Single code base

nginx reverse proxy

fairnginx

Mongrel

Fast deployment

Automate everything

Interoperability!

Play well with others!

Legacy databases

RESTful services

Generate API stubs

SCALE!

Trillionics

2X

150Tb per week

Over 6 months

More hardware

400 additional nodes

additional 360 Tb

Towards a Virtual Institute

Lots of data

Lots of data, lots of people

Lots of data, lots of people, lots of compute

Lots of data, lots of people, lots of compute,

lots of uses

Lots of data, lots of people, lots of compute, lots of uses, lots and lots

and lots and lots...

➌ Process

Concept Requirements Development Product

Concept Requirements Development Product

takes too long

RequirementsConcept Development Product

these change

takes too long

Concept

What we need Get ready

DevelopmentPlan

REVIEW

Focused

Project owner is key

Weekly releases

More flexible

Less time

Better transparency

Less software

Sequencing informatics

Thank you

GREENISGOOD.CO.UK

top related