a data retrieval workflow using ncbi e-utils + python part ii: jinja2 / flask john pinney tech talk...
TRANSCRIPT
![Page 1: A data retrieval workflow using NCBI E-Utils + Python Part II: Jinja2 / Flask John Pinney Tech talk Tue 19 th Nov](https://reader035.vdocuments.site/reader035/viewer/2022072013/56649e4d5503460f94b42ae1/html5/thumbnails/1.jpg)
A data retrieval workflow usingNCBI E-Utils + Python
Part II: Jinja2 / Flask
John PinneyTech talk Tue 19th Nov
![Page 2: A data retrieval workflow using NCBI E-Utils + Python Part II: Jinja2 / Flask John Pinney Tech talk Tue 19 th Nov](https://reader035.vdocuments.site/reader035/viewer/2022072013/56649e4d5503460f94b42ae1/html5/thumbnails/2.jpg)
My tasks
1. Produce a list of human genes that are associated withat least one resolved structure in PDBANDat least one genetic disorder in OMIM
2. Make an online table to display them
✓
![Page 3: A data retrieval workflow using NCBI E-Utils + Python Part II: Jinja2 / Flask John Pinney Tech talk Tue 19 th Nov](https://reader035.vdocuments.site/reader035/viewer/2022072013/56649e4d5503460f94b42ae1/html5/thumbnails/3.jpg)
Workflow for gene list
![Page 4: A data retrieval workflow using NCBI E-Utils + Python Part II: Jinja2 / Flask John Pinney Tech talk Tue 19 th Nov](https://reader035.vdocuments.site/reader035/viewer/2022072013/56649e4d5503460f94b42ae1/html5/thumbnails/4.jpg)
Python modules used in part 1
PyCogent Simple request handling for the main EUtils.pycogent.org
urllib2 General HTTP request handler.docs.python.org/2/library/urllib2.html
BeautifulSoup Amazingly easy to use object model for XML/HTML.www.crummy.com/software/BeautifulSoup/bs4/doc/
![Page 5: A data retrieval workflow using NCBI E-Utils + Python Part II: Jinja2 / Flask John Pinney Tech talk Tue 19 th Nov](https://reader035.vdocuments.site/reader035/viewer/2022072013/56649e4d5503460f94b42ae1/html5/thumbnails/5.jpg)
Some REST services need API keys
The OMIM server requires a license agreement but is free for academic use.They provide a personal API key which must be submitted with each HTTP request.
OMIM_APIKEY = 'E835870B16FBAF479E826FA5168CB2615EDA0F11'result = urllib2.urlopen( \
"http://api.europe.omim.org/api/entry?mimNumber=" + \ omimid + "&apiKey=" + OMIM_APIKEY \).read()
![Page 6: A data retrieval workflow using NCBI E-Utils + Python Part II: Jinja2 / Flask John Pinney Tech talk Tue 19 th Nov](https://reader035.vdocuments.site/reader035/viewer/2022072013/56649e4d5503460f94b42ae1/html5/thumbnails/6.jpg)
Throttling queries
Most bioinformatics web servers have limits on the number of queries that can be sent from the same IP address (per day / per second etc.)
They will ban you from accessing the site if you attempt too many requests.
This can have serious consequences (e.g. the whole institution being blocked from NCBI).
![Page 7: A data retrieval workflow using NCBI E-Utils + Python Part II: Jinja2 / Flask John Pinney Tech talk Tue 19 th Nov](https://reader035.vdocuments.site/reader035/viewer/2022072013/56649e4d5503460f94b42ae1/html5/thumbnails/7.jpg)
Throttling queries
To ensure compliance with usage limits, implement a simple throttle:
def omim_info(omimid):checktime('api.europe.omim.org')result = urllib2.urlopen(...
![Page 8: A data retrieval workflow using NCBI E-Utils + Python Part II: Jinja2 / Flask John Pinney Tech talk Tue 19 th Nov](https://reader035.vdocuments.site/reader035/viewer/2022072013/56649e4d5503460f94b42ae1/html5/thumbnails/8.jpg)
Throttling queries
import time
lastRequestTime = {}throttleDelay = {'eutils.ncbi.nlm.nih.gov':0.25, \
'api.europe.omim.org':0.5}
def checktime(host):if((host in lastRequestTime) and (time.time() - \
lastRequestTime[host] < throttleDelay[host])):
time.sleep(throttleDelay[host] - (time.time() - \ lastRequestTime[host]))
lastRequestTime[host] = time.time()
![Page 9: A data retrieval workflow using NCBI E-Utils + Python Part II: Jinja2 / Flask John Pinney Tech talk Tue 19 th Nov](https://reader035.vdocuments.site/reader035/viewer/2022072013/56649e4d5503460f94b42ae1/html5/thumbnails/9.jpg)
HTML templating
I need to produce an HTML table containing basic information about the genes I have collected.
The Jinja2 templating engine is an easy way to generate these kinds of documents.
I will use web services at NCBI and OMIM to assemble the information I need.
![Page 10: A data retrieval workflow using NCBI E-Utils + Python Part II: Jinja2 / Flask John Pinney Tech talk Tue 19 th Nov](https://reader035.vdocuments.site/reader035/viewer/2022072013/56649e4d5503460f94b42ae1/html5/thumbnails/10.jpg)
Jinja2
Using Jinja2 as an HTML templating engine, we need to split the work between 2 files:
a normal python script (in which I call the web services).an HTML template with embedded python commands.
Not all python functions are available within the template, so it makes sense to do as much work as possible within the script before passing the data over.
![Page 11: A data retrieval workflow using NCBI E-Utils + Python Part II: Jinja2 / Flask John Pinney Tech talk Tue 19 th Nov](https://reader035.vdocuments.site/reader035/viewer/2022072013/56649e4d5503460f94b42ae1/html5/thumbnails/11.jpg)
Jinja2 (script)
from jinja2 import Template
template = Template(file("gene_row_template.html").read())fout = open("gene_list.html",'w')...for g in sorted_genes:
fout.write( template.render(g=g,gene=gene_info(g),omim=[omim_info(x) for x in
omim_links(g)],struc=[struc_info(x) for x in
struc_links(g)])
)
(variables passed to template as kwargs)
![Page 12: A data retrieval workflow using NCBI E-Utils + Python Part II: Jinja2 / Flask John Pinney Tech talk Tue 19 th Nov](https://reader035.vdocuments.site/reader035/viewer/2022072013/56649e4d5503460f94b42ae1/html5/thumbnails/12.jpg)
Jinja2 (template)<tr>
<td><a href='http://www.ncbi.nlm.nih.gov/gene/?term={{g}}[uid]'>{{gene.find('Gene-ref_locus').text}}
</a></td><td>{{gene.find('Gene-ref_desc').text}}</td><td>{% for m in omim %}
<a href='http://omim.org/entry/{{m.mimNumber.text}}'>{{m.preferredTitle.text}}
</a><br>{% endfor %}</td>
<td>{% for s in struc -%}<a href='http://www.rcsb.org/pdb/explore/explore.do?
structureId={{s.find('Item',attrs={'Name':'PdbAcc'}).text}}'>{{s.find('Item',attrs={'Name':'PdbAcc'}).text}}
</a><br>{%- endfor %}</td>
</tr>
![Page 13: A data retrieval workflow using NCBI E-Utils + Python Part II: Jinja2 / Flask John Pinney Tech talk Tue 19 th Nov](https://reader035.vdocuments.site/reader035/viewer/2022072013/56649e4d5503460f94b42ae1/html5/thumbnails/13.jpg)
Jinja2 (template)<tr>
<td><a href='http://www.ncbi.nlm.nih.gov/gene/?term={{g}}[uid]'>{{gene.find('Gene-ref_locus').text}}
</a></td><td>{{gene.find('Gene-ref_desc').text}}</td><td>{% for m in omim %}
<a href='http://omim.org/entry/{{m.mimNumber.text}}'>{{m.preferredTitle.text}}
</a><br>{% endfor %}</td>
<td>{% for s in struc -%}<a href='http://www.rcsb.org/pdb/explore/explore.do?
structureId={{s.find('Item',attrs={'Name':'PdbAcc'}).text}}'>{{s.find('Item',attrs={'Name':'PdbAcc'}).text}}
</a><br>{%- endfor %}</td>
</tr>
{{ }} = print statement
{% %} = other command
I can access the methods of an object from within the template, so I can make use of all the nice BeautifulSoup shortcuts
![Page 14: A data retrieval workflow using NCBI E-Utils + Python Part II: Jinja2 / Flask John Pinney Tech talk Tue 19 th Nov](https://reader035.vdocuments.site/reader035/viewer/2022072013/56649e4d5503460f94b42ae1/html5/thumbnails/14.jpg)
Jinja2 (output)<tr>
<td><a href='http://www.ncbi.nlm.nih.gov/gene/?term=94[uid]'>ACVRL1
</a></td><td>activin A receptor type II-like 1</td><td>
<a href='http://omim.org/entry/600376'>TELANGIECTASIA, HEREDITARY HEMORRHAGIC, TYPE 2;
HHT2</a><br></td>
<td><a href='http://www.rcsb.org/pdb/explore/explore.do?structureId=4FAO'>
4FAO</a><br><a
href='http://www.rcsb.org/pdb/explore/explore.do?structureId=3MY0'>3MY0
</a><br></td></tr>
![Page 15: A data retrieval workflow using NCBI E-Utils + Python Part II: Jinja2 / Flask John Pinney Tech talk Tue 19 th Nov](https://reader035.vdocuments.site/reader035/viewer/2022072013/56649e4d5503460f94b42ae1/html5/thumbnails/15.jpg)
Something more interactive
What if I need to produce a report on-the-fly?
Flask is a ‘micro’ web development framework for Python, which is useful for putting together a simple webserver.
For anything more substantial (e.g. if database queries are needed), consider using Django.
Flask uses Jinja2 as its template engine.
![Page 16: A data retrieval workflow using NCBI E-Utils + Python Part II: Jinja2 / Flask John Pinney Tech talk Tue 19 th Nov](https://reader035.vdocuments.site/reader035/viewer/2022072013/56649e4d5503460f94b42ae1/html5/thumbnails/16.jpg)
A simple webapp in Flaskfrom flask import Flask, request, render_template, Response
app = Flask(__name__)
@app.route('/report/')def report_handler():
gene = request.args.get('gene')if( gene == None):
return render_template('report_form.html', unfound=None)else:
return report_for_gene_name(gene)
if __name__ == '__main__':app.run(debug=True)
![Page 17: A data retrieval workflow using NCBI E-Utils + Python Part II: Jinja2 / Flask John Pinney Tech talk Tue 19 th Nov](https://reader035.vdocuments.site/reader035/viewer/2022072013/56649e4d5503460f94b42ae1/html5/thumbnails/17.jpg)
Summary
Some web services may be more fiddly than others to set up, especially if they involve
API keysRequest limits (requires throttling)
Combining web services with an HTML template (either offline or on-the-fly via a webserver) is an easy way to generate user-friendly reports.
![Page 18: A data retrieval workflow using NCBI E-Utils + Python Part II: Jinja2 / Flask John Pinney Tech talk Tue 19 th Nov](https://reader035.vdocuments.site/reader035/viewer/2022072013/56649e4d5503460f94b42ae1/html5/thumbnails/18.jpg)
Python modules used in part 2
Jinja2 An elegant and highly versatile templating engine.http://jinja.pocoo.org/
Flask Python ‘micro’ web development framework.http://flask.pocoo.org