itunes u aggregator - hpi.dehpi.de/.../seminare/webprog_web20_1112/itunesu_aggregator.pdfmotivation...

115
iTunes U Aggregator A Rapid-Fire Walkthrough iTunes® and its Logo are registered trademarks of Apple Inc., registered in the U.S. and other countries. tele-TASK™ is a trademark of the Hasso-Plattner-Institut für Systemtechnik GmbH. All trademarks are property of their respective owners.

Upload: duongkiet

Post on 11-Nov-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

iTunes U Aggregator A Rapid-Fire Walkthrough

iTunes® and its Logo are registered trademarks of Apple Inc., registered in the U.S. and other countries. tele-TASK™ is a trademark of the Hasso-Plattner-Institut für Systemtechnik GmbH. All trademarks are property of their respective owners.

iTunes U D

jan

go

Excel

Pro

filing components

REST templatetags

sch

em

a

visualization

tro

llfac

es

motivation h

ttp://co

mm

on

s.wikim

edia.o

rg/wiki/File:In

ternet -m

ail.svg

motivation

New Email iTunes U Weekly Report for Hasso-Plattner- Institut für Systemtechnik (HPI)

http

://com

mo

ns.w

ikimed

ia.org/w

iki/File:Intern

et -mail.svg

hpi-de-public-dz-2011-10-16.xls

hpi-de-public-dz-2011-10-16.xls

week 1 (calendar week #42)

week 2 (calendar week #43)

week 3 (calendar week #44)

week 4 (calendar week #45)

hpi-de-public-dz-2011-10-16.xls

week 1 (calendar week #42)

week 2 (calendar week #43)

week 3 (calendar week #44)

week 4 (calendar week #45)

4 weeks in a report but one email each week

Browse

Browse

unique collection names?????????

Browse

Tracks

Tracks

Tracks

provider collection track provider

Tracks

provider collection track collection

Tracks

provider collection track track

Tracks

Tracks

GUIDS globally unique in your face

Previews

Previews

yay: a match! (with Browse)

Users

meh.

Edits

double meh.

Django to the rescue

Djan

go an

d th

e Djan

go Lo

go are registered

tradem

arks of D

jango

Softw

are Fou

nd

ation

.

BSD licensed

MVC driven

SERVER side

DOCUMENTED extremely well

RICH ecosystem

AWE some

Djan

go an

d th

e Djan

go Lo

go are registered

tradem

arks of D

jango

Softw

are Fou

nd

ation

.

BSD licensed

MVC driven

SERVER side

DOCUMENTED extremely well

RICH ecosystem

AWE some

Excel parsing schmarsing

import xlrd book = xlrd.open_workbook(“hpi-de-public-dz-2011-10-16.xls”) summary = book.sheet_by_name(“Summary”)

import xlrd book = xlrd.open_workbook(“hpi-de-public-dz-2011-10-16.xls”) summary = book.sheet_by_name(“Summary”)

import xlrd book = xlrd.open_workbook(“hpi-de-public-dz-2011-10-16.xls”) summary = book.sheet_by_name(“Summary”)

summary = book.sheet_by_name(“Summary”) summary.cell_value(rowx=6, colx=2) # C6 # => “2011-10-16” summary.row_values(rowx=4, start_colx=2, end_colx=6) # C5:F5 # => [“2011-10-16”, “2011-10-09”, ...]

summary = book.sheet_by_name(“Summary”) summary.cell_value(rowx=6, colx=2) # C6 # => “2011-10-16” summary.row_values(rowx=4, start_colx=2, end_colx=6) # C5:F5 # => [“2011-10-16”, “2011-10-09”, ...]

summary = book.sheet_by_name(“Summary”) summary.cell_value(rowx=6, colx=2) # C6 # => “2011-10-16” summary.row_values(rowx=4, start_colx=2, end_colx=6) # C5:F5 # => [“2011-10-16”, “2011-10-09”, ...]

summary = book.sheet_by_name(“Summary”) summary.cell_value(rowx=6, colx=2) # C6 # => “2011-10-16” summary.row_values(rowx=4, start_colx=2, end_colx=6) # C5:F5 # => [“2011-10-16”, “2011-10-09”, ...] UserActions.objects.create(date=“2011-10-16”,

action=“Browse”, value=1991)

pyxlreader?

pyExcelerator?

pyxlreader?

pyExcelerator? unmaintained & undocumented

Resolver One Excel meets IronPython

Resolver One Excel meets IronPython

not suitable for Web applications

Tracks

provider collection track

Tracks

provider collection track

Series

Internet Security

Anti-Virus Software

Podcast Attack Signatures Social

Hacking ... 2011-10-16

60 tracks

Sample

2011-10-09 36 tracks ...

Tracks

podcast, _ = Podcast.objects.get_or_create( name=“Internet Security”, handle=6979209575) sample, _ = Sample.objects.get_or_create( date=“2011-10-16”, podcast=podcast) sample.tracks = 60 sample.save()

Tracks

podcast, _ = Podcast.objects.get_or_create( name=“Internet Security”, handle=6979209575) sample, _ = Sample.objects.get_or_create( date=“2011-10-16”, podcast=podcast) sample.tracks = 60 sample.save()

podcast, _ = Podcast.objects.get_or_create( name=“Internet Security”, handle=6979209575) sample, _ = Sample.objects.get_or_create( date=“2011-10-16”, podcast=podcast) sample.tracks = 60 sample.save()

Tracks

podcast, _ = Podcast.objects.get_or_create( name=“Internet Security”, handle=6979209575) sample, _ = Sample.objects.get_or_create( date=“2011-10-16”, podcast=podcast) sample.tracks = 60 sample.save()

podcast, _ = Podcast.objects.get_or_create( name=“Internet Security”, handle=6979209575) sample, _ = Sample.objects.get_or_create( date=“2011-10-16”, podcast=podcast) sample.tracks = 60 sample.save()

podcast, _ = Podcast.objects.get_or_create( name=“Internet Security”, handle=6979209575) sample, _ = Sample.objects.get_or_create( date=“2011-10-16”, podcast=podcast) sample.tracks = 60 sample.save()

VIRT TIME+ Command

153M 7:52.73 python manage.py runserver

VIRT TIME+ Command

144M 0:19.26 python manage.py runserver

import

aggregate sum

for a 2MB database 20k records

VIRT TIME+ Command

153M 7:52.73 python manage.py runserver

VIRT TIME+ Command

144M 0:19.26 python manage.py runserver

import

aggregate sum

caching ka-ching!

Model.objects.create(…) # o = Model(…) # o.save() Model.objects.create(…) …

transaction #1

transaction #2 ...

Model.objects.create(…) # o = Model(…) # o.save() Model.objects.create(…) …

Model.objects.create(…) # o = Model(…) # o.save() Model.objects.create(…) …

from django.db import transaction with transaction.commit_on_success(): Model.objects.create(…) # o = Model(…) # o.save() Model.objects.create(…) …

transaction #1

Model.objects.create(…) # o = Model(…) # o.save() Model.objects.create(…) …

Model.objects.create(…) # o = Model(…) # o.save() Model.objects.create(…) …

from django.db import transaction with transaction.commit_on_success(): Model.objects.create(…) # o = Model(…) # o.save() Model.objects.create(…) …

transaction #1

90% sys time savings

no difference for user time

Y U NO FAST?

import logging logging.info(...)

Y U NO FAST? 0 [tasks:INFO] Processing ../hpi-de-public-dz-2011-10-16.xls.. 2 [tasks:INFO] Processing week 2011-10-16.. 2 [tasks:DEBUG] Inserting actions.. 2 [tasks:DEBUG] Inserting clients.. 2 [tasks:DEBUG] Opening browse sheet.. 2 [tasks:DEBUG] Inserting 71 browse actions.. 3 [tasks:DEBUG] Opening tracks sheet.. 3 [tasks:DEBUG] Inserting 3779 tracks actions.. 22 [tasks:DEBUG] Finished 500 rows. 37 [tasks:DEBUG] Finished 1000 rows. 51 [tasks:DEBUG] Finished 1500 rows. 66 [tasks:DEBUG] Finished 2000 rows. 71 [tasks:DEBUG] Finished 2500 rows. 86 [tasks:DEBUG] Finished 3000 rows. 100 [tasks:DEBUG] Finished 3500 rows. 108 [tasks:DEBUG] Opening Previews sheet.

Y U NO FAST? 0 [tasks:INFO] Processing ../hpi-de-public-dz-2011-10-16.xls.. 2 [tasks:INFO] Processing week 2011-10-16.. 2 [tasks:DEBUG] Inserting actions.. 2 [tasks:DEBUG] Inserting clients.. 2 [tasks:DEBUG] Opening browse sheet.. 2 [tasks:DEBUG] Inserting 71 browse actions.. 3 [tasks:DEBUG] Opening tracks sheet.. 3 [tasks:DEBUG] Inserting 3779 tracks actions.. 22 [tasks:DEBUG] Finished 500 rows. 37 [tasks:DEBUG] Finished 1000 rows. 51 [tasks:DEBUG] Finished 1500 rows. 66 [tasks:DEBUG] Finished 2000 rows. 71 [tasks:DEBUG] Finished 2500 rows. 86 [tasks:DEBUG] Finished 3000 rows. 100 [tasks:DEBUG] Finished 3500 rows. 108 [tasks:DEBUG] Opening Previews sheet.

1:48

Y U NO FAST? 0 [tasks:INFO] Processing ../hpi-de-public-dz-2011-10-16.xls.. 2 [tasks:INFO] Processing week 2011-10-16.. 2 [tasks:DEBUG] Inserting actions.. 2 [tasks:DEBUG] Inserting clients.. 2 [tasks:DEBUG] Opening browse sheet.. 2 [tasks:DEBUG] Inserting 71 browse actions.. 3 [tasks:DEBUG] Opening tracks sheet.. 3 [tasks:DEBUG] Inserting 3779 tracks actions.. 22 [tasks:DEBUG] Finished 500 rows. 37 [tasks:DEBUG] Finished 1000 rows. 51 [tasks:DEBUG] Finished 1500 rows. 66 [tasks:DEBUG] Finished 2000 rows. 71 [tasks:DEBUG] Finished 2500 rows. 86 [tasks:DEBUG] Finished 3000 rows. 100 [tasks:DEBUG] Finished 3500 rows. 108 [tasks:DEBUG] Opening Previews sheet.

1:48

where is time spent accurately?

Y U NO FAST?

python –mprofile …

Y U NO FAST?

python –mprofile … python –mcProfile …

Y U NO FAST?

python –mprofile … python –mcProfile …

python –mcProfile –s cumulative manage.py loadreport hpi-de-public-dz-2011-10-16.xls

Y U NO FAST?

python –mprofile … python –mcProfile …

python –mcProfile –s cumulative manage.py loadreport hpi-de-public-dz-2011-10-16.xls

60% time spent in django.db.models.Manager.get_or_create

Series

Internet Security

Anti-Virus Software Social

Hacking Podcast Internet Memes ...

2011-10-16 60 tracks

21 previews

Sample

2011-10-09

12 previews ...

Series

Internet Security

Anti-Virus Software Social

Hacking Podcast Internet Memes ...

2011-10-16 60 tracks

21 previews

Sample

2011-10-09

12 previews ...

Series

Internet Security

Anti-Virus Software Social

Hacking Podcast Internet Memes ...

2011-10-16 60 tracks

21 previews

Sample

2011-10-09

12 previews ...

Series

Internet Security

Anti-Virus Software Social

Hacking Podcast Internet Memes ...

2011-10-16 60 tracks

21 previews

Sample

2011-10-09

12 previews ... 2011-10-09

33 tracks 12 previews

Series

Internet Security

Anti-Virus Software Social

Hacking Podcast Internet Memes ...

2011-10-16 60 tracks

21 previews

Sample

2011-10-09

12 previews ... 2011-10-09

33 tracks 12 previews

Previews Tracks

2011-10-09

36 2011-10-09

12

Series

Internet Security

Anti-Virus Software Social

Hacking Podcast Internet Memes ...

2011-10-16 60 tracks

21 previews

Sample

2011-10-09

12 previews ... 2011-10-09

33 tracks 12 previews

Previews Tracks

2011-10-09

36 2011-10-09

12

50% time savings through denormalization

Series

Internet Security

Anti-Virus Software Social

Hacking Podcast Internet Memes ...

2011-10-16 60 tracks

21 previews

Sample

2011-10-09

12 previews ... 2011-10-09

33 tracks 12 previews

Previews Tracks

2011-10-09

36 2011-10-09

12

50% time savings through denormalization

90% time savings through full denormalization

Series

Internet Security

Anti-Virus Software Social

Hacking Podcast Internet Memes ...

2011-10-16 60 tracks

21 previews

Sample

2011-10-09

12 previews ... 2011-10-09

33 tracks 12 previews

Previews Tracks

2011-10-09

36 2011-10-09

12

50% time savings through denormalization

90% time savings through full denormalization

5% larger database through denormalization

Series

Internet Security

Anti-Virus Software Social

Hacking Podcast Internet Memes ...

2011-10-16 60 tracks

21 previews

Sample

2011-10-09

12 previews ... 2011-10-09

33 tracks 12 previews

Previews Tracks

2011-10-09

36 2011-10-09

12

50% time savings through denormalization

90% time savings through full denormalization

5% larger database through denormalization

320% larger database through full denormalization

Pokémon are copyrighted by Nintendo Co., Ltd. http://www.flickr.com/photos/darktabris/5654794283( © 2011 Sergio Cuellar

yo app’s so fat it consumes my whole memory

the dreaded MEMORY LEAK

Pokémon are copyrighted by Nintendo Co., Ltd. http://www.flickr.com/photos/darktabris/5654794283( © 2011 Sergio Cuellar

yo app’s so fat it consumes my whole memory

Series

Internet Security

Django object cache

Series

Internet Security

Django object cache purge via Model.objects.update()

check with objgraph

@django.views.decorators.cache.cache_page def teletask_series(request, id): ...

view

@django.views.decorators.cache.cache_page def teletask_series(request, id): ...

view

{% load cache %} {% cache 604800 views %} {% endcache %}

template

@django.views.decorators.cache.cache_page def teletask_series(request, id): ...

view

{% load cache %} {% cache 604800 views %} {% endcache %}

template

{% load cache %} {% cache 604800 views %} {{views}} {% endcache %}

@django.views.decorators.cache.cache_page def teletask_series(request, id): ...

view

{% load cache %} {% cache 604800 views %} {% endcache %}

template

from django.core.cache import cache cache.set('views', 60) cache.get('views')

low-level

{% load cache %} {% cache 604800 views %} {{views}} {% endcache %}

reuse

iTunes U stats

iTunes U stats

iTunes U stats

<h1>{{ name }}</h1> <p>{{ description }}</p> <a href=“itunesu/{{ id }}”> iTunes U stats </a> {% for track in tracks %} <h2>{{ track.name }}</h2> {% endfor %}

<h1>{{ name }}</h1> <p> Total Views: {{ views }} </p>

two different pages

<h1>{{ name }}</h1> <p>{{ description }}</p> <a href=“itunesu/{{ id }}”> iTunes U stats </a> {% for track in tracks %} <h2>{{ track.name }}</h2> {% endfor %}

<h1>{{ name }}</h1> <p>{{ description }}</p> <a href=“itunesu/{{ id }}”> iTunes U stats </a> {% for track in tracks %} <h2>{{ track.name }}</h2> {% endfor %}

<h1>{{ name }}</h1> <p> Total Views: {{ views }} </p>

<h1>{{ name }}</h1> <p>{{ description }}</p> <a href=“itunesu/{{ id }}”> iTunes U stats </a> {% for track in tracks %} <h2>{{ track.name }}</h2> {% endfor %}

<h1>{{ name }}</h1> <p> Total Views: {{ views }} </p>

two different pages

<h1>Internet Security</h1> <p>{{ description }}</p> <span id=“totalviews”> {{ views }} </span> {% for track in tracks %} <h2>{{ track.name }}</h2> {% endfor %} controller must be adapted

to pass views variable

<h1>Internet Security</h1> <p>{{ description }}</p> <span id=“totalviews”> {{ views }} </span> {% for track in tracks %} <h2>{{ track.name }}</h2> {% endfor %}

{% load itunesuagg %} <h1>Internet Security</h1> <p>{{ description }}</p> <span id=“totalviews”> {% viewcount for name %} </span> {% for track in tracks %} <h2>{{ track.name }}</h2> {% endfor %}

or: REST API

visuals

Client (interaction)

Server

(performance)

Third-party

Client (interaction)

Server

(performance)

Google Chart API

Third-party

Client (interaction)

Server

(performance)

Matplotlib, Cairo

YUI, Google Chart Tools, Flot, Highchart

from pygooglechart import PieChart3D chart = PieChart3D(250, 100) chart.add_data([20, 10]) chart.set_pie_labels(['Hello', 'World']) print chart.get_url()

Google Chart API

from pygooglechart import PieChart3D chart = PieChart3D(250, 100) chart.add_data([20, 10]) chart.set_pie_labels(['Hello', 'World']) print chart.get_url()

Google Chart API

http://chart.apis.google.com/chart?cht=p3&chs=250x100&chd=s:pU&chl=Hello|World

from pygooglechart import PieChart3D chart = PieChart3D(250, 100) chart.add_data([20, 10]) chart.set_pie_labels(['Hello', 'World']) print chart.get_url()

Google Chart API

http://chart.apis.google.com/chart?cht=p3&chs=250x100&chd=s:pU&chl=Hello|World

so what about large datasets?

chart = SimpleLineChart(200, 125) data = [32, 34, 34, 32, 34, 34, 32, 32, 32, 34, 34, 32, 29, 29, 34, 34, 34, 37, 37, 39, 42, 47, 50, 54, 57, 60, 60, 60, 60, 60, 60, 60, 62, 62, 60, 55, 55, 52, 47, 44, 44, 40, 40,

37, 34, 34, 32, 32, 32, 31, 32] chart.add_data(data) ... print chart.get_url()

Google Chart API

chart = SimpleLineChart(200, 125) data = [32, 34, 34, 32, 34, 34, 32, 32, 32, 34, 34, 32, 29, 29, 34, 34, 34, 37, 37, 39, 42, 47, 50, 54, 57, 60, 60, 60, 60, 60, 60, 60, 62, 62, 60, 55, 55, 52, 47, 44, 44, 40, 40,

37, 34, 34, 32, 32, 32, 31, 32] chart.add_data(data) ... print chart.get_url()

http://chart.apis.google.com/chart?cht=lc&chs=200x125&chd=e:UeVwVwUeVwVwUeUeUeVwVwUeSkSkVwVwVwXrXrY9a4eFgAijkemZmZmZmZmZmZmZnrnrmZjMjMhReFcKcKZmZmXrwVwUeUeUeT1Ue&chco=0000FF&chf=c,ls,0,CCCCCC,0.2,FFFFFF,0.2&chxt=y,x&chxl=0:||25|50|75|100|1:|Jan|Feb|Mar|Apr|May|Jun&chg=0,25,5,5

Google Chart API

chart = SimpleLineChart(200, 125) data = [32, 34, 34, 32, 34, 34, 32, 32, 32, 34, 34, 32, 29, 29, 34, 34, 34, 37, 37, 39, 42, 47, 50, 54, 57, 60, 60, 60, 60, 60, 60, 60, 62, 62, 60, 55, 55, 52, 47, 44, 44, 40, 40,

37, 34, 34, 32, 32, 32, 31, 32] chart.add_data(data) ... print chart.get_url()

http://chart.apis.google.com/chart?cht=lc&chs=200x125&chd=e:UeVwVwUeVwVwUeUeUeVwVwUeSkSkVwVwVwXrXrY9a4eFgAijkemZmZmZmZmZmZmZnrnrmZjMjMhReFcKcKZmZmXrwVwUeUeUeT1Ue&chco=0000FF&chf=c,ls,0,CCCCCC,0.2,FFFFFF,0.2&chxt=y,x&chxl=0:||25|50|75|100|1:|Jan|Feb|Mar|Apr|May|Jun&chg=0,25,5,5

Google Chart API

2KB URI length limitation

chart = SimpleLineChart(200, 125) data = [32, 34, 34, 32, 34, 34, 32, 32, 32, 34, 34, 32, 29, 29, 34, 34, 34, 37, 37, 39, 42, 47, 50, 54, 57, 60, 60, 60, 60, 60, 60, 60, 62, 62, 60, 55, 55, 52, 47, 44, 44, 40, 40,

37, 34, 34, 32, 32, 32, 31, 32] chart.add_data(data) ... print chart.get_url()

http://chart.apis.google.com/chart?cht=lc&chs=200x125&chd=e:UeVwVwUeVwVwUeUeUeVwVwUeSkSkVwVwVwXrXrY9a4eFgAijkemZmZmZmZmZmZmZnrnrmZjMjMhReFcKcKZmZmXrwVwUeUeUeT1Ue&chco=0000FF&chf=c,ls,0,CCCCCC,0.2,FFFFFF,0.2&chxt=y,x&chxl=0:||25|50|75|100|1:|Jan|Feb|Mar|Apr|May|Jun&chg=0,25,5,5

Google Chart API

2KB URI length limitation

solution: POST it

chart = SimpleLineChart(200, 125) data = [32, 34, 34, 32, 34, 34, 32, 32, 32, 34, 34, 32, 29, 29, 34, 34, 34, 37, 37, 39, 42, 47, 50, 54, 57, 60, 60, 60, 60, 60, 60, 60, 62, 62, 60, 55, 55, 52, 47, 44, 44, 40, 40,

37, 34, 34, 32, 32, 32, 31, 32] chart.add_data(data) ... print chart.get_url()

http://chart.apis.google.com/chart?cht=lc&chs=200x125&chd=e:UeVwVwUeVwVwUeUeUeVwVwUeSkSkVwVwVwXrXrY9a4eFgAijkemZmZmZmZmZmZmZnrnrmZjMjMhReFcKcKZmZmXrwVwUeUeUeT1Ue&chco=0000FF&chf=c,ls,0,CCCCCC,0.2,FFFFFF,0.2&chxt=y,x&chxl=0:||25|50|75|100|1:|Jan|Feb|Mar|Apr|May|Jun&chg=0,25,5,5

Google Chart API

2KB URI length limitation

solution: POST it

16KB limitation not in <img>

http://www.highcharts.com/demo/line-basic

Highcharts (SVG)

var chart; $(document).ready(function() { chart = new Highcharts.Chart({ chart: { renderTo: 'container' }, series: [{ name: 'Tokyo', data: [ 7.0, 6.9, 9.5, 14.5, 18.2, 21.5, 25.2, 26.5, 23.3, 18.3, 13.9, 9.6 ]}] }); )};

http://www.highcharts.com/demo/line-basic

Highcharts (SVG)

What about license? Creative Commons Attribution-NonCommercial 3.0 License

Flot (Canvas)

Flot (Canvas)

1000 points is not a problem, but as soon as you start having more points than the pixel width, you should probably

start thinking about downsampling/aggregation.

Third-party

Client (interaction)

Server

(performance)

Third-party

Client (interaction)

Server

(performance)

primetime ready for

No file chosen Choose File Upload!

No file chosen Choose File Upload!

No file chosen Choose File Upload!

14,248 data sets imported.

insights

get_or_create is dangerous

denormalization is key

profiling can be fun

licensing is hard

always drink your milk.

insights

get_or_create is dangerous

denormalization is key

profiling can be fun

licensing is hard

thanks for your attention.

always drink your milk.