decoding our bacterial overlords - melbourne knowledge week - tue 28 oct 2014
TRANSCRIPT
Decoding our
bacterial overlords
Dr Torsten Seemann
A bacterium
Bacteria are diverse
5,000,000,000,000,000,000,000,000,000,000,000
000,000,000,000,000,000,000,000,000,000,000,
000,000,000,000,000,000,000,000.
Bacteria run the show
100,000,000,000,000
1,000,000
90% microbial
Help digest
our food
Essential for human life
Immune systemSynthesize
vitamins
“Good” E.coli “Bad”
(colon) (bladder)
Bacteria are not malicious
6,000,000,000
letters
The blueprint of life
GenomeA T G C
4,000,000
letters
Extract
the DNA
Reading the genome
Chop it into
small piecesRead DNA of each piece
We had a bunch of nice long DNA(each 4 million letters long)
We got back millions of short DNA (each only 200 letters long)
We want our nice long DNA back!(please)
Can’t always get what you want
Reconstruct the DNA of the chromosome(s)
Genome assembly
● No box
● Millions of pieces
● Missing and duplicate pieces
● Broken pieces
● No corner or edge pieces
→ Usually end up with ~200 sequences
Like a jigsaw puzzle, but ...
Contains ~4,000 genes
Each gene is ~800 letters long
Genes start and end with special triplets
Finding genes
←ATGCATGATTAGCTTTTAGTCTTATAATGTCTTATATATCGCATTTAAGCCCTGATTCTATGAATG→
Genome is ~4,000,000 letters long
● Identify new species
● Find resistance genes
● Understand evolution
● Trace outbreak origin
Applications
2000 finished genomes
10,000 assembled draft genomes
200,000 downloadable genomes
2,000,000 sitting on USB disks?
Genome assembly is different
- RAM more useful than CPU
Computational challenge