biopython, doctest and makefiles
Post on 16-Apr-2017
2.227 Views
Preview:
TRANSCRIPT
Barcelona Python Developers Seminars
biopython, doctest and
makefiles
This is me
Giovanni
Phd student in a Population Genetics lab
Not a biopython dev
(that could be not my real photo)
Intro
BioPython -> a collection of standard python modules for bioinformatics
Advantages of using open source libraries in science:
more reproducibility
easier to compare results
less errors
less time spent
BioPython some use cases
The human genome sequencing project (2001):
TCCATGGCCTCCCGGCAAGCCTAAGCTAGCGCAATTGTCAGACGCACAGGACCGGTCTGGGGAGACCAATGTGTTCAGACAACGATTCCCAGCTAGTACCACTGTTTGACTCGGAAGATGTGTACAACTATTGTAGCGACTGTGTCCCATCATTGCATTCAAACCCAAGTAATTGATGGATCAACAAAGGATACACTCCAAAAGTCGCACAGAGATTGGTCATCTTAACGCGAGATTAAACATGCGTCTATACGCCCGTGTTAAGTTCGGCCGCCATCGTACAAATAAGCGAGNNNNTATCAATCTAATCTTAAACCGGCTCTTGAGAAGGGCTAGCGGCGTTAGGACCCGCTGCCGGCCGTGAGCGTGCGTTCACTCTGAACAGCGCCATCGATGGGTCGCTTGTGTAGCTATTTTAAGGACGCGACATAGGCCCTGGGGCAGTTACTGGGGCATGCCCACTATATCCGCGGGCAAGTTGGTATTCAGCTATGTTTATCTCTCGCCCAATGCGTGAAAGCGCCAAACGTGGGTAGAGGACTTAGCAATTTGGGGCATGCCCTGCTCTTTTAGATCTGTTAAGCAATCCGCGCGTAGGGCTCGCTGCGTCGTAAATGTGAGCGCAAGTCACCGACGCAGTGGTAATATACGTGTAACTGATCATCNNNNNNTCCCGAACCATGCCTTCTAACAGGAGATGCCCAAGGTCGAGGGTCACCGCCAACGACCGGCTGATCCCTGTTGGTGAGGATTTATGGAGGTGGACTGTCAGGTAGGCAAGAACTCTGGGTGAATTTGCGAGCGCTATCTCTAAGTTACACGCTTTACTGGGGCATGCCCGGGCCGTAGAAGTTACTGGGGCATGCCCCACGTAATAGGTTTTCATGAGGAGATGTTTGGTCTGATTCTCGAGATTGTGGCTAAGTATTGAGTCAGACTTACTGGGGCATTTACTGGGGCATGCCCGCCCTGCTCTTTTAGATCTGTTAAGCAATCCGCGCGTAGGGCTCGCTGCGTCGTAAATGTGAGCGCAAGTCACCGACGCAGTGGTAATATACGTGTAACTGATCATCTTCATGATTCCCGAACCATGCCTTCTAACAGGAGATGCCCAAGGTCGAGGGTCACCGCCAACGACCGGCTGATTTACTGGGGCATGCCCCCCNNNNNGAGGATTTNNNNTGGAGCCTATCTCACATTTTAAACTTCAATCATCATAACACGTGCGCACTTTTTCCGCGCTTGACGGCGAAGTGACTGGCCACTTCCTGCTCCCTGTTTTTCCCAATACCTGACAAGTGTGGCATCTGTCCCCCTGAAGAGGACTAGAGTATCATTACGGGGGGCTTGACACTTACCTTCATAGG.............
Up to ~3*109 characters
Lot of regexs (perl-ists like it)
Could be obtained for >> help(say_hello)
Help on function say_hello in module __main__:
say_hello(name) print hello to the screen example: >>> say_hello('Albert Einstein') hello Albert Einstein!!!
doctest how does it works
#!/usr/bin/env python
def sum(x, y):
'''
sums two numbers
example:
>>> print sum(1, 2)
3
'''
return x + y
if __name__ == '__main__':
import doctest
doctest.testmod()
doctest.testmod() looks for any line beginning with '>>>' and execute it as a python command
The result is compared with the subsequent lines (expected output). If there are differences, an error is raised.
If 'print sum(1, 2)' doesn't return 3, an error is raised
doctest - examples
BioPython - SeqIO.parse
doctest file parsing example
In bioinformatics there are many formats with semi-homonymous names
ped, tped, bed, tmap, pdb, fasta...
It is useful to put an example of input file in every parser function
Choose good examples
Write the doctest along with who will use the script (e.g. A fellow scientist)
Ask them 'how this function is supposed to behave in this example?'
Simplify: round all numbers to multiples of 100, put comments
Doctest Pros and Cons
Pros:
docs always up to date
Usage examples
Quick tests when you are coding
Cons:
Functions that read files (StringIO? NamedTempFile?)
Still need to write a unittest
Can't use lines longer than 80 characters (PEP8)
Random generators / statistics / rounding
Bioinformatics a different approach
The approach between programming software and programming experiments is different:
Testing has different dimensions (biological meaning, reproducibility)
Usually you write numerous scripts, each one carrying out a small task, and glue them with a pipeline/wrapper script/makefile/automated builds tool/xml described workflow/insert others here
I am a makefile guy
What is a makefile?
gnu/make is an utility for building C/C++ programs.
It can be used to save shell commands (...) with their options and re-execute them at will.
Example:
:$ make all
python retrieve_data.py --option1 --option2
perl convert_format.pl --input inputfile --option3
perl convert_format.pl --inputfile inputfile2
Simplest Makefile example
$: cat Makefile
help:
echo 'execute make all to carry out the whole analysis'
get_data:
python retrieve_data.py --database ensembl --specie Human --output
sequences.fasta
calculate_results:
perl calculate_results.pl --option1 --option2 --input
sequence.fasta --output results.txt
all: get_data calculate_results
Makefiles Pros
Conditional execution
If there is no need to execute a command, it is skipped (checks if the expected output file already exists and is up-to-date)
Chaining commands
You can define the order in which commands must be executed (download sequences first, then read them)
Support for clusters
Syntax is ugly, but standard
Make - Cons
Gnu/Make has a very ugly syntax
Really, I hate its syntax
I am looking for substitutes in python:
scons
paver
waf (google summer of code project)
Still haven't start using them
Implement something in biopython?
A more complicated Makefile
Variables like %, $@, $<
Modificators like -, @
addprefix, addsuffix ??
Triple parentesis ??
Thanks for the attention!
Did you like the talk?
BioPython use cases
Single Nucleotides Polymorphisms are positions in the genome that tend to vary most between different individuals
We are working with data on 650.000 SNPs on 1000 of individuals
Need to organize data on objects (SNPs, Genotypes, Individuals, Populations), use a database for support, calculate statistics on them
Doctest a closer look
#usr/bin/env python
def say_hello(name):
'''
print hello (name) to the screen
example:
>>> say_hello('Albert Einstein')
hello Albert Einstein!!!
'''
print 'hello ' + name + '!!!'
if __name__ == '__main__':
import doctest
doctest.testmod()
new function definitionnormal doc
example of function usage
expected output
body of the function
call to the doctest module
Muokkaa otsikon tekstimuotoa napsauttamalla
Muokkaa jsennyksen tekstimuotoa napsauttamalla
Toinen jsennystaso
Kolmas jsennystaso
Neljs jsennystaso
Viides jsennystaso
Kuudes jsennystaso
Seitsems jsennystaso
Kahdeksas jsennystaso
Yhdekss jsennystaso
top related