How to Create a High-Speed Template Engine in Python
makoto kuwatahttp://www.kuwata-lab.com/
PyconMini JP 2011
Pythonにおけるテンプレートエンジンの高速化と失敗談
Profile お前、誰よ?@makotokuwata
http://www.kwuata-lab.com/
Ruby/PHP/Python programmer
Creator of Erubis (*)
Python4PHPer
パイを広げるのに熱心なだけの人間(*) default template engine on Rails 3
Python Products
Tenjin : very fast temlate engine
Kook : task utility like Ant/Rake
Benchmarker : a good friend for performance
Oktest : new-style testing library
Tenjin
Very fast
One file, 2000 lines
Full-featured
Python 3 support
Google App Engine
Release 1.0 coming soon!
http://www.kuwta-lab.com/tenjin/
<table> <?py for x in xs: ?> <tr> <tr>${x}</tr> </tr> <?py #endfor ?></table>
BenchmarkTenjin
Mako
Jinja2
Templetor
Cheetah
Django
Genshi
Kid0 600 1200 1800 2400 3000
34.6
55.7
114.2
562.3
903.0
1257.6
1426.4
2660.1
pages/secPython 2.5.5, MacOS X 10.6 (x86_64), 2GBTenjin 1.0.0, Mako 0.2.5, Jinja2 2.2.1, Templetor 0.32,Cheetah 2.2.2, Django 1.1.0, Genshi 0.5.1, Kid 0.9.6
Benchmarks forString Concatenation
append()
_buf = []_buf.append(s)_buf.append(s)_buf.append(s)output = "".join(_buf)
Benchmarkappend()
0 200 400 600 800 1000
pages/sec
extend()
_buf = []_buf.extend((s, s, s, ))output = "".join(_buf)
Benchmarkappend()
extend()
0 200 400 600 800 1000
pages/sec
StringIO
from cStringIO import StringIO_buf = StringIO()_buf.write(s)_buf.write(s)_buf.write(s)output = _buf.getvalue()
Benchmarkappend()
extend()
StringIO
0 200 400 600 800 1000
pages/sec
mmap
import mmap_buf = mmap.mmap(-1, 2*1024*1024)_buf.write(s)_buf.write(s)_buf.write(s)length = _buf.tell()_buf.seek(0)output = _buf.read(length)
Benchmarkappend()
extend()
StringIO
mmap
0 200 400 600 800 1000
pages/sec
Generator
def _gen(s): yield s yield s yield s
output = "".join(_gen(s))
Benchmarkappend()
extend()
StringIO
mmap
generator
0 200 400 600 800 1000
pages/sec
Slice
_buf = [""]_buf[-1:] = (s, s, s, "")output = "".join(_buf)# or_buf = []_buf[999999:] = (s, s, s, )output = "".join(_buf)
Benchmarkappend()
extend()
StringIO
mmap
generator
slice[-1:]
slice[99999:]
0 200 400 600 800 1000
pages/sec
Bound method
_buf = []_extend = _buf.extend_extend((s, s, s, ))output = "".join(_buf)
Benchmarkappend()
extend()
StringIO
mmap
generator
slice[-1:]
slice[99999:]
extend() (bound)0 200 400 600 800 1000
pages/sec
Summary
Fast
bound method >= slice[] > extend()
Slow
Generator > append() > mmap > StringIO
Try Benchmark Script
$ wget wget http://pypi.python.org/packages/source/B/Benchmarker/Benchmarker-3.0.0.tar.gz$ tar xzf Benchmarker-3.0.0.tar.gz$ cd Benchmarker-3.0.0/$ sudo python setup.py install$ cd examples/$ python bench_strconcat.py
Step by Step toTune-up Template Code
HTML Template<html> <head> : <table> <?py for item in items: ?> <tr> <td>#{item}</td> : </tr> <?py #endfor ?> </table> :
Python Code_buf = []; _buf.append("<html>\n") :for item in items: _buf.append(" <tr>\n") _buf.append(" <td>"); _buf.append(item); _buf.append("<td>\n"); :#endfor :return "".join(_buf)
Benchmarkappend (singleline)
0 2000 4000 6000 8000 10000 12000
pages/sec
Multiple Line String## before_buf.append("<!DOCTYPE HTML>\n")_buf.append("<html>\n")_buf.append(" <head>\n")
## after_buf.append("""<!DOCTYPE HTML><html> <head>""")
Eliminates method call
Benchmarkappend (singleline)append (multiline)
0 2000 4000 6000 8000 10000 12000
pages/sec
From append() to extend()## before_buf.append(""" <td>""")_buf.append(item)_buf.append("""</td>\n""")
## after_buf.extend((""" <td>""", item, """</td>\n""", ))
Eliminates method call
Benchmarkappend (singleline)append (multiline)extend (unbound)
0 2000 4000 6000 8000 10000 12000
pages/sec
Bound Method## before_buf.extend(("...", "...", "...", ))_buf.extend(("...", "...", "...", ))_buf.extend(("...", "...", "...", ))
## after_extend = _buf.extend_extend(("...", "...", "...", ))_extend(("...", "...", "...", ))_extend(("...", "...", "...", ))
Eliminates fetch method
Benchmarkappend (singleline)append (multiline)extend (unbound)
extend (bound)
0 2000 4000 6000 8000 10000 12000
pages/sec
str() function## before_extend((" <td>", item1, """</td> <td>""", item2, """</td> <td>""", item3, """</td>""", ))
## after_extend((" <td>", str(item1), """</td> <td>""", str(item2), """</td> <td>""", str(item3), """</td>""", ))
Necessary in Python!
Benchmarkappend (singleline)append (multiline)extend (unbound)
extend (bound)
extend + str
0 2000 4000 6000 8000 10000 12000
pages/sec
Local Variable## before_extend((str(item1), str(item2), str(item3), ))
## after_str = str_extend((_str(item1), _str(item2), _str(item3), ))
Local var is faster than global/build-in var
Benchmarkappend (singleline)append (multiline)extend (unbound)
extend (bound)
extend + strextend + _str=str
0 2000 4000 6000 8000 10000 12000
pages/sec
Format ('%' operator)## before_extend(("<td>", _str(item1), """</td><td>""", _str(item2), """</td><td>""", _str(item3), "</td>\n", ))
## after_append("""<td>%s</td><td>%s</td><td>%s</td>\n""" % \ (item1, item2, item3, ))
Delete all str() callby '%' operator
Benchmarkappend (singleline)append (multiline)extend (unbound)
extend (bound)
extend + strextend + _str=strappend + format
0 2000 4000 6000 8000 10000 12000
pages/sec
None => Empty String## afterdef to_str(v): if v is None: return "" else: return str(v)
_to_str = to_str_extend((_to_str(item1), _to_str(item2), _to_str(item3), ))
Converts None to empty string
Benchmarkappend (singleline)append (multiline)extend (unbound)
extend (bound)
extend + strextend + _str=strappend + format
extend + to_strextend + _to_str=to_str
0 2000 4000 6000 8000 10000 12000
pages/sec
Escape HTML## afterdef escape_html(s): return s.replace('&', '&') \ .replace('<', '<') \ .replace('>', '>') \ .replace('"', '"')
_extend((""" <td>""", escape_html(to_str(item)), """</td>\n""", ))
Benchmarkappend (singleline)append (multiline)extend (unbound)
extend (bound)
extend + strextend + _str=strappend + format
extend + to_strextend + _to_str=to_str
escape_html + strescape_html + to_str
0 2000 4000 6000 8000 10000 12000
pages/sec
C Extension## afterfrom webext import to_str, escape_html
_extend((""" <td>""", escape_html(to_str(item)), """</td>\n""", ))## or_extend((""" <td>""", escape_html(item), """</td>\n""", ))
Implemented in C
webext: http://pypi.python.org/pypi/Webext/
Benchmarkappend (singleline)append (multiline)extend (unbound)
extend (bound)
extend + strextend + _str=strappend + format
extend + to_strextend + _to_str=to_str
escape_html + strescape_html + to_str
webext.escape_html, to_strwebext.escape_html
0 2000 4000 6000 8000 10000 12000
pages/sec
Extreme join()
## after_buf = [ "<td>", item1, "</td>\n<td>", item2, "</td>\n<td>", "", item3, "", "</td>\n",]output = webext.join(_buf, escape=webext.escape_html)
Not escaped if index % 2 == 0
Be escaped if index % 2 == 1
(no need to callescape_html() !)
Benchmark
Not implemeted yet...
Summary
String concatenation is not a bottleneck
extend() & join() are enough fast
Bottleneck is str() and escape_html()
join() should call str() internally
C Extension (webext) is great
Other Topics
Google says...
... The major web applications we have surveyed have indicated that they bottleneck primarily on template systems, ...
http://code.google.com/p/unladen-swallow/wiki/ProjectPlan
Django?
Case Study #1http://www.myweightracker.com/
Switch from Django template to Tenjin
DjangoM, C, Network, etc...
M, C, Network, etc...App Speed
30% Up!
https://groups.google.com/group/kuwata-lab-products/browse_thread/thread/b50877a9c56d64c9/60f77b5c9b9f5238
Case Study #2Ruby on Rails 1.2
Remove helper methods by preprocessing
Helper MethodsM, C, Network, etc...
M, C, Network, etc...App Speed
100% Up!
http://jp.rubyist.net/magazine/?0021-Erubis
template engine
Components of View Layer
Cache Mechanism
Template Engine
Helper Functions
Important for performance! More Important
for performance!
Just one of them
Preprocessing in Tenjin
<p>${_('Hello')}</p>
_extend(("<p>", _('Hello'), "</p>", ))
<p>こんにちは</p>
Convert
Execute Called everytime
Preprocessing in Tenjin
<p>${{_('Hello')}}</p>
_extend(("<p>こんにちは</p>", ))
<p>こんにちは</p>
Convert
Execute
Call functionin this stage
Func call removed
Python v.s. Others
plTenjin (Perl)
pyTenjin+Webext
phpTenjin (PHP)
pyTenjin (Python)
rbTenjin (Ruby)
0 2500 5000 7500 10000 12500
2634.8
2682.9
2788.0
4179.7
12108.0
pages/sec
Perl is the
Champion!
Why Perl is so Fast?
No need to call str(val) nor val.toString()
Bytecode op for string concatenation
## slowvar $x = join("", ($s1, $s2, $3, ... ));
## extremely fast!var $x = $s1 . $s2 . $s3 . ...;
C Ext v.s. Pure Script
plTenjinMobaSiF
Template::Toolkit
pyTenjin+WebextpyTenjinCheetah
rbTenjineruby
0 2500 5000 7500 10000 12500
pages/sec
C ExtC Ext
C Ext
C Ext
Pure Perl
Pure Python
Pure Ruby
Python + C Ext
No need to impl engine in C
(except helpers)
Summary
View layer components
Template engine, Helper functions, and Cache mechanism
No need to implement engine in C (except helper functions)
Perl is great
Django temlate engine sucks
Appendix
Tenjin: fast & full-featured template engine
http://www.kuwata-lab.com/tenjin/
Webext: C extension for escape_html()
http://pypi.python.org/pypi/Webext/
Benchmarker: a utility for benchmarking
http://pypi.python.org/pypi/Benchmarker/
Appendix
Cより速いRubyプログラムhttp://www.kuwata-lab.com/presen/rubykaigi2007.pdf
http://jp.rubyist.net/magazine/?0022-FasterThanC
Javaより速いLL用テンプレートエンジンhttp://www.kuwata-lab.com/presen/LL2007LT.pdf
テンプレートシステム入門http://jp.rubyist.net/magazine/?0024-TemplateSystemhttp://jp.rubyist.net/magazine/?0024-TemplateSystem2
thank you