itunes u aggregator - hpi.dehpi.de/.../seminare/webprog_web20_1112/itunesu_aggregator.pdfmotivation...
TRANSCRIPT
iTunes U Aggregator A Rapid-Fire Walkthrough
iTunes® and its Logo are registered trademarks of Apple Inc., registered in the U.S. and other countries. tele-TASK™ is a trademark of the Hasso-Plattner-Institut für Systemtechnik GmbH. All trademarks are property of their respective owners.
motivation
New Email iTunes U Weekly Report for Hasso-Plattner- Institut für Systemtechnik (HPI)
http
://com
mo
ns.w
ikimed
ia.org/w
iki/File:Intern
et -mail.svg
hpi-de-public-dz-2011-10-16.xls
week 1 (calendar week #42)
week 2 (calendar week #43)
week 3 (calendar week #44)
week 4 (calendar week #45)
hpi-de-public-dz-2011-10-16.xls
week 1 (calendar week #42)
week 2 (calendar week #43)
week 3 (calendar week #44)
week 4 (calendar week #45)
4 weeks in a report but one email each week
Djan
go an
d th
e Djan
go Lo
go are registered
tradem
arks of D
jango
Softw
are Fou
nd
ation
.
BSD licensed
MVC driven
SERVER side
DOCUMENTED extremely well
RICH ecosystem
AWE some
Djan
go an
d th
e Djan
go Lo
go are registered
tradem
arks of D
jango
Softw
are Fou
nd
ation
.
BSD licensed
MVC driven
SERVER side
DOCUMENTED extremely well
RICH ecosystem
AWE some
import xlrd book = xlrd.open_workbook(“hpi-de-public-dz-2011-10-16.xls”) summary = book.sheet_by_name(“Summary”)
import xlrd book = xlrd.open_workbook(“hpi-de-public-dz-2011-10-16.xls”) summary = book.sheet_by_name(“Summary”)
import xlrd book = xlrd.open_workbook(“hpi-de-public-dz-2011-10-16.xls”) summary = book.sheet_by_name(“Summary”)
summary = book.sheet_by_name(“Summary”) summary.cell_value(rowx=6, colx=2) # C6 # => “2011-10-16” summary.row_values(rowx=4, start_colx=2, end_colx=6) # C5:F5 # => [“2011-10-16”, “2011-10-09”, ...]
summary = book.sheet_by_name(“Summary”) summary.cell_value(rowx=6, colx=2) # C6 # => “2011-10-16” summary.row_values(rowx=4, start_colx=2, end_colx=6) # C5:F5 # => [“2011-10-16”, “2011-10-09”, ...]
summary = book.sheet_by_name(“Summary”) summary.cell_value(rowx=6, colx=2) # C6 # => “2011-10-16” summary.row_values(rowx=4, start_colx=2, end_colx=6) # C5:F5 # => [“2011-10-16”, “2011-10-09”, ...]
summary = book.sheet_by_name(“Summary”) summary.cell_value(rowx=6, colx=2) # C6 # => “2011-10-16” summary.row_values(rowx=4, start_colx=2, end_colx=6) # C5:F5 # => [“2011-10-16”, “2011-10-09”, ...] UserActions.objects.create(date=“2011-10-16”,
action=“Browse”, value=1991)
Tracks
provider collection track
Series
Internet Security
Anti-Virus Software
Podcast Attack Signatures Social
Hacking ... 2011-10-16
60 tracks
Sample
2011-10-09 36 tracks ...
Tracks
podcast, _ = Podcast.objects.get_or_create( name=“Internet Security”, handle=6979209575) sample, _ = Sample.objects.get_or_create( date=“2011-10-16”, podcast=podcast) sample.tracks = 60 sample.save()
Tracks
podcast, _ = Podcast.objects.get_or_create( name=“Internet Security”, handle=6979209575) sample, _ = Sample.objects.get_or_create( date=“2011-10-16”, podcast=podcast) sample.tracks = 60 sample.save()
podcast, _ = Podcast.objects.get_or_create( name=“Internet Security”, handle=6979209575) sample, _ = Sample.objects.get_or_create( date=“2011-10-16”, podcast=podcast) sample.tracks = 60 sample.save()
Tracks
podcast, _ = Podcast.objects.get_or_create( name=“Internet Security”, handle=6979209575) sample, _ = Sample.objects.get_or_create( date=“2011-10-16”, podcast=podcast) sample.tracks = 60 sample.save()
podcast, _ = Podcast.objects.get_or_create( name=“Internet Security”, handle=6979209575) sample, _ = Sample.objects.get_or_create( date=“2011-10-16”, podcast=podcast) sample.tracks = 60 sample.save()
podcast, _ = Podcast.objects.get_or_create( name=“Internet Security”, handle=6979209575) sample, _ = Sample.objects.get_or_create( date=“2011-10-16”, podcast=podcast) sample.tracks = 60 sample.save()
VIRT TIME+ Command
153M 7:52.73 python manage.py runserver
VIRT TIME+ Command
144M 0:19.26 python manage.py runserver
import
aggregate sum
for a 2MB database 20k records
VIRT TIME+ Command
153M 7:52.73 python manage.py runserver
VIRT TIME+ Command
144M 0:19.26 python manage.py runserver
import
aggregate sum
Model.objects.create(…) # o = Model(…) # o.save() Model.objects.create(…) …
transaction #1
transaction #2 ...
Model.objects.create(…) # o = Model(…) # o.save() Model.objects.create(…) …
Model.objects.create(…) # o = Model(…) # o.save() Model.objects.create(…) …
from django.db import transaction with transaction.commit_on_success(): Model.objects.create(…) # o = Model(…) # o.save() Model.objects.create(…) …
transaction #1
Model.objects.create(…) # o = Model(…) # o.save() Model.objects.create(…) …
Model.objects.create(…) # o = Model(…) # o.save() Model.objects.create(…) …
from django.db import transaction with transaction.commit_on_success(): Model.objects.create(…) # o = Model(…) # o.save() Model.objects.create(…) …
transaction #1
90% sys time savings
no difference for user time
Y U NO FAST? 0 [tasks:INFO] Processing ../hpi-de-public-dz-2011-10-16.xls.. 2 [tasks:INFO] Processing week 2011-10-16.. 2 [tasks:DEBUG] Inserting actions.. 2 [tasks:DEBUG] Inserting clients.. 2 [tasks:DEBUG] Opening browse sheet.. 2 [tasks:DEBUG] Inserting 71 browse actions.. 3 [tasks:DEBUG] Opening tracks sheet.. 3 [tasks:DEBUG] Inserting 3779 tracks actions.. 22 [tasks:DEBUG] Finished 500 rows. 37 [tasks:DEBUG] Finished 1000 rows. 51 [tasks:DEBUG] Finished 1500 rows. 66 [tasks:DEBUG] Finished 2000 rows. 71 [tasks:DEBUG] Finished 2500 rows. 86 [tasks:DEBUG] Finished 3000 rows. 100 [tasks:DEBUG] Finished 3500 rows. 108 [tasks:DEBUG] Opening Previews sheet.
Y U NO FAST? 0 [tasks:INFO] Processing ../hpi-de-public-dz-2011-10-16.xls.. 2 [tasks:INFO] Processing week 2011-10-16.. 2 [tasks:DEBUG] Inserting actions.. 2 [tasks:DEBUG] Inserting clients.. 2 [tasks:DEBUG] Opening browse sheet.. 2 [tasks:DEBUG] Inserting 71 browse actions.. 3 [tasks:DEBUG] Opening tracks sheet.. 3 [tasks:DEBUG] Inserting 3779 tracks actions.. 22 [tasks:DEBUG] Finished 500 rows. 37 [tasks:DEBUG] Finished 1000 rows. 51 [tasks:DEBUG] Finished 1500 rows. 66 [tasks:DEBUG] Finished 2000 rows. 71 [tasks:DEBUG] Finished 2500 rows. 86 [tasks:DEBUG] Finished 3000 rows. 100 [tasks:DEBUG] Finished 3500 rows. 108 [tasks:DEBUG] Opening Previews sheet.
1:48
Y U NO FAST? 0 [tasks:INFO] Processing ../hpi-de-public-dz-2011-10-16.xls.. 2 [tasks:INFO] Processing week 2011-10-16.. 2 [tasks:DEBUG] Inserting actions.. 2 [tasks:DEBUG] Inserting clients.. 2 [tasks:DEBUG] Opening browse sheet.. 2 [tasks:DEBUG] Inserting 71 browse actions.. 3 [tasks:DEBUG] Opening tracks sheet.. 3 [tasks:DEBUG] Inserting 3779 tracks actions.. 22 [tasks:DEBUG] Finished 500 rows. 37 [tasks:DEBUG] Finished 1000 rows. 51 [tasks:DEBUG] Finished 1500 rows. 66 [tasks:DEBUG] Finished 2000 rows. 71 [tasks:DEBUG] Finished 2500 rows. 86 [tasks:DEBUG] Finished 3000 rows. 100 [tasks:DEBUG] Finished 3500 rows. 108 [tasks:DEBUG] Opening Previews sheet.
1:48
where is time spent accurately?
Y U NO FAST?
python –mprofile … python –mcProfile …
python –mcProfile –s cumulative manage.py loadreport hpi-de-public-dz-2011-10-16.xls
Y U NO FAST?
python –mprofile … python –mcProfile …
python –mcProfile –s cumulative manage.py loadreport hpi-de-public-dz-2011-10-16.xls
60% time spent in django.db.models.Manager.get_or_create
Series
Internet Security
Anti-Virus Software Social
Hacking Podcast Internet Memes ...
2011-10-16 60 tracks
21 previews
Sample
2011-10-09
12 previews ...
Series
Internet Security
Anti-Virus Software Social
Hacking Podcast Internet Memes ...
2011-10-16 60 tracks
21 previews
Sample
2011-10-09
12 previews ...
Series
Internet Security
Anti-Virus Software Social
Hacking Podcast Internet Memes ...
2011-10-16 60 tracks
21 previews
Sample
2011-10-09
12 previews ...
Series
Internet Security
Anti-Virus Software Social
Hacking Podcast Internet Memes ...
2011-10-16 60 tracks
21 previews
Sample
2011-10-09
12 previews ... 2011-10-09
33 tracks 12 previews
Series
Internet Security
Anti-Virus Software Social
Hacking Podcast Internet Memes ...
2011-10-16 60 tracks
21 previews
Sample
2011-10-09
12 previews ... 2011-10-09
33 tracks 12 previews
Previews Tracks
2011-10-09
36 2011-10-09
12
Series
Internet Security
Anti-Virus Software Social
Hacking Podcast Internet Memes ...
2011-10-16 60 tracks
21 previews
Sample
2011-10-09
12 previews ... 2011-10-09
33 tracks 12 previews
Previews Tracks
2011-10-09
36 2011-10-09
12
50% time savings through denormalization
Series
Internet Security
Anti-Virus Software Social
Hacking Podcast Internet Memes ...
2011-10-16 60 tracks
21 previews
Sample
2011-10-09
12 previews ... 2011-10-09
33 tracks 12 previews
Previews Tracks
2011-10-09
36 2011-10-09
12
50% time savings through denormalization
90% time savings through full denormalization
Series
Internet Security
Anti-Virus Software Social
Hacking Podcast Internet Memes ...
2011-10-16 60 tracks
21 previews
Sample
2011-10-09
12 previews ... 2011-10-09
33 tracks 12 previews
Previews Tracks
2011-10-09
36 2011-10-09
12
50% time savings through denormalization
90% time savings through full denormalization
5% larger database through denormalization
Series
Internet Security
Anti-Virus Software Social
Hacking Podcast Internet Memes ...
2011-10-16 60 tracks
21 previews
Sample
2011-10-09
12 previews ... 2011-10-09
33 tracks 12 previews
Previews Tracks
2011-10-09
36 2011-10-09
12
50% time savings through denormalization
90% time savings through full denormalization
5% larger database through denormalization
320% larger database through full denormalization
Pokémon are copyrighted by Nintendo Co., Ltd. http://www.flickr.com/photos/darktabris/5654794283( © 2011 Sergio Cuellar
yo app’s so fat it consumes my whole memory
the dreaded MEMORY LEAK
Pokémon are copyrighted by Nintendo Co., Ltd. http://www.flickr.com/photos/darktabris/5654794283( © 2011 Sergio Cuellar
yo app’s so fat it consumes my whole memory
@django.views.decorators.cache.cache_page def teletask_series(request, id): ...
view
{% load cache %} {% cache 604800 views %} {% endcache %}
template
@django.views.decorators.cache.cache_page def teletask_series(request, id): ...
view
{% load cache %} {% cache 604800 views %} {% endcache %}
template
{% load cache %} {% cache 604800 views %} {{views}} {% endcache %}
@django.views.decorators.cache.cache_page def teletask_series(request, id): ...
view
{% load cache %} {% cache 604800 views %} {% endcache %}
template
from django.core.cache import cache cache.set('views', 60) cache.get('views')
low-level
{% load cache %} {% cache 604800 views %} {{views}} {% endcache %}
<h1>{{ name }}</h1> <p>{{ description }}</p> <a href=“itunesu/{{ id }}”> iTunes U stats </a> {% for track in tracks %} <h2>{{ track.name }}</h2> {% endfor %}
<h1>{{ name }}</h1> <p> Total Views: {{ views }} </p>
two different pages
<h1>{{ name }}</h1> <p>{{ description }}</p> <a href=“itunesu/{{ id }}”> iTunes U stats </a> {% for track in tracks %} <h2>{{ track.name }}</h2> {% endfor %}
<h1>{{ name }}</h1> <p>{{ description }}</p> <a href=“itunesu/{{ id }}”> iTunes U stats </a> {% for track in tracks %} <h2>{{ track.name }}</h2> {% endfor %}
<h1>{{ name }}</h1> <p> Total Views: {{ views }} </p>
<h1>{{ name }}</h1> <p>{{ description }}</p> <a href=“itunesu/{{ id }}”> iTunes U stats </a> {% for track in tracks %} <h2>{{ track.name }}</h2> {% endfor %}
<h1>{{ name }}</h1> <p> Total Views: {{ views }} </p>
two different pages
<h1>Internet Security</h1> <p>{{ description }}</p> <span id=“totalviews”> {{ views }} </span> {% for track in tracks %} <h2>{{ track.name }}</h2> {% endfor %} controller must be adapted
to pass views variable
<h1>Internet Security</h1> <p>{{ description }}</p> <span id=“totalviews”> {{ views }} </span> {% for track in tracks %} <h2>{{ track.name }}</h2> {% endfor %}
{% load itunesuagg %} <h1>Internet Security</h1> <p>{{ description }}</p> <span id=“totalviews”> {% viewcount for name %} </span> {% for track in tracks %} <h2>{{ track.name }}</h2> {% endfor %}
or: REST API
Google Chart API
Third-party
Client (interaction)
Server
(performance)
Matplotlib, Cairo
YUI, Google Chart Tools, Flot, Highchart
from pygooglechart import PieChart3D chart = PieChart3D(250, 100) chart.add_data([20, 10]) chart.set_pie_labels(['Hello', 'World']) print chart.get_url()
Google Chart API
from pygooglechart import PieChart3D chart = PieChart3D(250, 100) chart.add_data([20, 10]) chart.set_pie_labels(['Hello', 'World']) print chart.get_url()
Google Chart API
http://chart.apis.google.com/chart?cht=p3&chs=250x100&chd=s:pU&chl=Hello|World
from pygooglechart import PieChart3D chart = PieChart3D(250, 100) chart.add_data([20, 10]) chart.set_pie_labels(['Hello', 'World']) print chart.get_url()
Google Chart API
http://chart.apis.google.com/chart?cht=p3&chs=250x100&chd=s:pU&chl=Hello|World
so what about large datasets?
chart = SimpleLineChart(200, 125) data = [32, 34, 34, 32, 34, 34, 32, 32, 32, 34, 34, 32, 29, 29, 34, 34, 34, 37, 37, 39, 42, 47, 50, 54, 57, 60, 60, 60, 60, 60, 60, 60, 62, 62, 60, 55, 55, 52, 47, 44, 44, 40, 40,
37, 34, 34, 32, 32, 32, 31, 32] chart.add_data(data) ... print chart.get_url()
Google Chart API
chart = SimpleLineChart(200, 125) data = [32, 34, 34, 32, 34, 34, 32, 32, 32, 34, 34, 32, 29, 29, 34, 34, 34, 37, 37, 39, 42, 47, 50, 54, 57, 60, 60, 60, 60, 60, 60, 60, 62, 62, 60, 55, 55, 52, 47, 44, 44, 40, 40,
37, 34, 34, 32, 32, 32, 31, 32] chart.add_data(data) ... print chart.get_url()
http://chart.apis.google.com/chart?cht=lc&chs=200x125&chd=e:UeVwVwUeVwVwUeUeUeVwVwUeSkSkVwVwVwXrXrY9a4eFgAijkemZmZmZmZmZmZmZnrnrmZjMjMhReFcKcKZmZmXrwVwUeUeUeT1Ue&chco=0000FF&chf=c,ls,0,CCCCCC,0.2,FFFFFF,0.2&chxt=y,x&chxl=0:||25|50|75|100|1:|Jan|Feb|Mar|Apr|May|Jun&chg=0,25,5,5
Google Chart API
chart = SimpleLineChart(200, 125) data = [32, 34, 34, 32, 34, 34, 32, 32, 32, 34, 34, 32, 29, 29, 34, 34, 34, 37, 37, 39, 42, 47, 50, 54, 57, 60, 60, 60, 60, 60, 60, 60, 62, 62, 60, 55, 55, 52, 47, 44, 44, 40, 40,
37, 34, 34, 32, 32, 32, 31, 32] chart.add_data(data) ... print chart.get_url()
http://chart.apis.google.com/chart?cht=lc&chs=200x125&chd=e:UeVwVwUeVwVwUeUeUeVwVwUeSkSkVwVwVwXrXrY9a4eFgAijkemZmZmZmZmZmZmZnrnrmZjMjMhReFcKcKZmZmXrwVwUeUeUeT1Ue&chco=0000FF&chf=c,ls,0,CCCCCC,0.2,FFFFFF,0.2&chxt=y,x&chxl=0:||25|50|75|100|1:|Jan|Feb|Mar|Apr|May|Jun&chg=0,25,5,5
Google Chart API
2KB URI length limitation
chart = SimpleLineChart(200, 125) data = [32, 34, 34, 32, 34, 34, 32, 32, 32, 34, 34, 32, 29, 29, 34, 34, 34, 37, 37, 39, 42, 47, 50, 54, 57, 60, 60, 60, 60, 60, 60, 60, 62, 62, 60, 55, 55, 52, 47, 44, 44, 40, 40,
37, 34, 34, 32, 32, 32, 31, 32] chart.add_data(data) ... print chart.get_url()
http://chart.apis.google.com/chart?cht=lc&chs=200x125&chd=e:UeVwVwUeVwVwUeUeUeVwVwUeSkSkVwVwVwXrXrY9a4eFgAijkemZmZmZmZmZmZmZnrnrmZjMjMhReFcKcKZmZmXrwVwUeUeUeT1Ue&chco=0000FF&chf=c,ls,0,CCCCCC,0.2,FFFFFF,0.2&chxt=y,x&chxl=0:||25|50|75|100|1:|Jan|Feb|Mar|Apr|May|Jun&chg=0,25,5,5
Google Chart API
2KB URI length limitation
solution: POST it
chart = SimpleLineChart(200, 125) data = [32, 34, 34, 32, 34, 34, 32, 32, 32, 34, 34, 32, 29, 29, 34, 34, 34, 37, 37, 39, 42, 47, 50, 54, 57, 60, 60, 60, 60, 60, 60, 60, 62, 62, 60, 55, 55, 52, 47, 44, 44, 40, 40,
37, 34, 34, 32, 32, 32, 31, 32] chart.add_data(data) ... print chart.get_url()
http://chart.apis.google.com/chart?cht=lc&chs=200x125&chd=e:UeVwVwUeVwVwUeUeUeVwVwUeSkSkVwVwVwXrXrY9a4eFgAijkemZmZmZmZmZmZmZnrnrmZjMjMhReFcKcKZmZmXrwVwUeUeUeT1Ue&chco=0000FF&chf=c,ls,0,CCCCCC,0.2,FFFFFF,0.2&chxt=y,x&chxl=0:||25|50|75|100|1:|Jan|Feb|Mar|Apr|May|Jun&chg=0,25,5,5
Google Chart API
2KB URI length limitation
solution: POST it
16KB limitation not in <img>
http://www.highcharts.com/demo/line-basic
Highcharts (SVG)
http://www.highcharts.com/demo/line-basic
Highcharts (SVG)
var chart; $(document).ready(function() { chart = new Highcharts.Chart({ chart: { renderTo: 'container' }, series: [{ name: 'Tokyo', data: [ 7.0, 6.9, 9.5, 14.5, 18.2, 21.5, 25.2, 26.5, 23.3, 18.3, 13.9, 9.6 ]}] }); )};
http://www.highcharts.com/demo/line-basic
Highcharts (SVG)
What about license? Creative Commons Attribution-NonCommercial 3.0 License
Flot (Canvas)
1000 points is not a problem, but as soon as you start having more points than the pixel width, you should probably
start thinking about downsampling/aggregation.
insights
get_or_create is dangerous
denormalization is key
profiling can be fun
licensing is hard
always drink your milk.