big data and genomics

Download Big Data and Genomics

If you can't read please download the document

Upload: al-costa

Post on 15-Feb-2017

74 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Big Data and Genomics

Al Costa Alkol Biotech

Low sequencing costs = lots of data

As the costs of sequencing a genome decreases, the DNA of more and more organisms become publicly available, meaning more data

Low sequencing costs = lots of data

This problem is increased if we consider initiatives such as 1000Genomes or if we were to sequence everyone in the US today (313 Exabytes)

genomics = lots of data

In fact, the number of bytes involved in each DNA genome is in the range of millions to billions

genomics = lots of data

And if you are still unconvinced, take the Minion, by UK company Oxford Nanopores, which sells for US$900, is the size of a USB stick, and can sequence a human genome in 8 hours

Comparative genomics = promising

However, if we compare the genomes of different species, well realize they share a lot of common ground

Tools used = complicated

For genomics data we use ADAM, BLAST and several comparison tools

ADAM is an open-source, high performance, distributed platform for genomic analsys. ADAM defines a:1 - Data schema and layout on disk2 - A Scala API3 - A command line interface

BLAST is an aligment tool which is able to reconstruct the entire strand based on shotgun chunks.

An example = our project

We are currently using Big Data to find promising strands among millions of DNA sequences, using the tools described as Ill explain now

How we use it = to build new crops

The current state of the biobased industry (biofuels, bioplastics, etc) is trying to adapt to unsuitable feedstocks. That is exactly the opposite to what making did with food, where it adapted crops to its feeding needs

Sugarcane = much more than sugar!

Among the feedstocks currently used by the biobased industries, one stands out: sugarcane. However, it currently grows only in tropical regions. A pity, considering the amount of products it originates.

Eunergycane = European sugarcane

Thus, being able to adapt sugarcane to grow in Europe would mean a lot of new products being sustainably produced. We are half-way in that project with our EUnergyCane variety, the only one genuinely european

a pine tree and an edelweiss?

Maybe the only thing that is common between a pine tree and an edelweiss is the fact that both can stand cold places.

Looking for a philosophers stone

Thus, a comparison between the DNA strand of the pine tree and of the Edelweiss should reveal common regions, one of which responsible for example for giving a crop the ability to withstand the cold

How we use it = to build new crops

This is how we develop our work: by analizing DNA strands of crops which can resist the cold in order to find that Philosophers Stone which, when inserted into sugarcane, would make it able to grow in Europe. For that, new techniques such as CRISPR/CAS 9 prevent the use of plasmids and GMOs

Conclusion = big data is much more

Big Data is not only for gathering customer data at banks and telcos, but a valuable tool in finding new and unsuspecting data in any area of human knowledge.

It use in Genomics may allow finding cures for otherwise incurable diseases, develop new crops with increased capabilities, and much more

Thank you

[email protected]

Click to edit the title text formatClick to edit Master title style

11/04/16