learning python from data

115
LEARNING PYTHON FROM DATA Mosky 1

Upload: mosky-liu

Post on 06-May-2015

1.135 views

Category:

Technology


3 download

DESCRIPTION

It is the slides for COSCUP[1] 2013 Hands-on[2], "Learning Python from Data". It aims for using examples to show the world of Python. Hope it will help you with learning Python. [1] COSCUP: http://coscup.org/ [2] COSCUP Hands-on: http://registrano.com/events/coscup-2013-hands-on-mosky

TRANSCRIPT

Page 1: Learning Python from Data

LEARNING PYTHON FROM DATA

Mosky

1

Page 2: Learning Python from Data

THIS SLIDE

• The online version is at https://speakerdeck.com/mosky/learning-python-from-data.

• The examples are at https://github.com/moskytw/learning-python-from-data-examples.

2

Page 3: Learning Python from Data

MOSKY

3

Page 4: Learning Python from Data

MOSKY

• I am working at Pinkoi.

3

Page 5: Learning Python from Data

MOSKY

• I am working at Pinkoi.

• I've taught Python for 100+ hours.

3

Page 6: Learning Python from Data

MOSKY

• I am working at Pinkoi.

• I've taught Python for 100+ hours.

• A speaker atCOSCUP 2014, PyCon SG 2014, PyCon APAC 014, OSDC 2014, PyCon APAC 2013, COSCUP 2014, ...

3

Page 7: Learning Python from Data

MOSKY

• I am working at Pinkoi.

• I've taught Python for 100+ hours.

• A speaker atCOSCUP 2014, PyCon SG 2014, PyCon APAC 014, OSDC 2014, PyCon APAC 2013, COSCUP 2014, ...

• The author of the Python packages: MoSQL, Clime, ZIPCodeTW, ...

3

Page 8: Learning Python from Data

MOSKY

• I am working at Pinkoi.

• I've taught Python for 100+ hours.

• A speaker atCOSCUP 2014, PyCon SG 2014, PyCon APAC 014, OSDC 2014, PyCon APAC 2013, COSCUP 2014, ...

• The author of the Python packages: MoSQL, Clime, ZIPCodeTW, ...

• http://mosky.tw/3

Page 9: Learning Python from Data

SCHEDULE

4

Page 10: Learning Python from Data

SCHEDULE

• Warm-up

4

Page 11: Learning Python from Data

SCHEDULE

• Warm-up

• Packages - Install the packages we need.

4

Page 12: Learning Python from Data

SCHEDULE

• Warm-up

• Packages - Install the packages we need.

• CSV - Download a CSV from the Internet and handle it.

4

Page 13: Learning Python from Data

SCHEDULE

• Warm-up

• Packages - Install the packages we need.

• CSV - Download a CSV from the Internet and handle it.

• HTML - Parse a HTML source code and write a Web crawler.

4

Page 14: Learning Python from Data

SCHEDULE

• Warm-up

• Packages - Install the packages we need.

• CSV - Download a CSV from the Internet and handle it.

• HTML - Parse a HTML source code and write a Web crawler.

• SQL - Save data into a SQLite database.

4

Page 15: Learning Python from Data

SCHEDULE

• Warm-up

• Packages - Install the packages we need.

• CSV - Download a CSV from the Internet and handle it.

• HTML - Parse a HTML source code and write a Web crawler.

• SQL - Save data into a SQLite database.

• The End4

Page 16: Learning Python from Data

FIRST OF ALL,

5

Page 17: Learning Python from Data

6

Page 18: Learning Python from Data

PYTHON IS AWESOME!

6

Page 19: Learning Python from Data

2 OR 3?

7

Page 20: Learning Python from Data

2 OR 3?

• Use Python 3!

7

Page 21: Learning Python from Data

2 OR 3?

• Use Python 3!

• But it actually depends on the libs you need.

7

Page 22: Learning Python from Data

2 OR 3?

• Use Python 3!

• But it actually depends on the libs you need.

• https://python3wos.appspot.com/

7

Page 23: Learning Python from Data

2 OR 3?

• Use Python 3!

• But it actually depends on the libs you need.

• https://python3wos.appspot.com/

• We will go ahead with Python 2.7,but I will also introduce the changes in Python 3.

7

Page 24: Learning Python from Data

THE ONLINE RESOURCES

8

Page 26: Learning Python from Data

THE ONLINE RESOURCES

• The Python Official Doc

• http://docs.python.org

• The Python Tutorial

• The Python Standard Library

• My Past Slides

• Programming with Python - Basic

• Programming with Python - Adv.

8

Page 27: Learning Python from Data

THE BOOKS

9

Page 31: Learning Python from Data

PREPARATION

10

Page 32: Learning Python from Data

PREPARATION

• Did you say "hello" to Python?

10

Page 33: Learning Python from Data

PREPARATION

• Did you say "hello" to Python?

• If no, visit

• http://www.slideshare.net/moskytw/programming-with-python-basic.

10

Page 34: Learning Python from Data

PREPARATION

• Did you say "hello" to Python?

• If no, visit

• http://www.slideshare.net/moskytw/programming-with-python-basic.

• If yes, open your Python shell.

10

Page 35: Learning Python from Data

WARM-UPThe things you must know.

11

Page 36: Learning Python from Data

MATH & VARS

2 + 32 - 32 * 32 / 3, -2 / 3!(1+10)*10 / 2!2.0 / 3!2 % 3!2 ** 3

x = 2!y = 3!z = x + y!print z!'#' * 10

12

Page 37: Learning Python from Data

FOR

for i in [0, 1, 2, 3, 4]: print i!items = [0, 1, 2, 3, 4] for i in items: print i!for i in range(5): print i!!!

chars = 'SAHFI' for i, c in enumerate(chars): print i, c!!words = ('Samsung', 'Apple', 'HP', 'Foxconn', 'IBM') for c, w in zip(chars, words): print c, w

13

Page 38: Learning Python from Data

IF

for i in range(1, 10): if i % 2 == 0: print '{} is divisible by 2'.format(i) elif i % 3 == 0: print '{} is divisible by 3'.format(i) else: print '{} is not divisible by 2 nor 3'.format(i)

14

Page 39: Learning Python from Data

WHILE

while 1: n = int(raw_input('How big pyramid do you want? ')) if n <= 0: print 'It must greater than 0: {}'.format(n) continue break

15

Page 40: Learning Python from Data

TRY

while 1:! try: n = int(raw_input('How big pyramid do you want? ')) except ValueError as e: print 'It must be a number: {}'.format(e) continue! if n <= 0: print 'It must greater than 0: {}'.format(n) continue! break

16

Page 41: Learning Python from Data

LOOP ... ELSE

for n in range(2, 100): for i in range(2, n): if n % i == 0: break else: print '{} is a prime!'.format(n)

17

Page 42: Learning Python from Data

A PYRAMID

****

************

********************

****************************

************************************

18

Page 43: Learning Python from Data

A FATER PYRAMID

******

**********************

*******************

19

Page 44: Learning Python from Data

YOUR TURN!

20

Page 45: Learning Python from Data

LIST COMPREHENSION

[ n for n in range(2, 100) if not any(n % i == 0 for i in range(2, n))]

21

Page 46: Learning Python from Data

PACKAGESimport is important.

22

Page 47: Learning Python from Data

23

Page 48: Learning Python from Data

GET PIP - UN*X

24

Page 49: Learning Python from Data

GET PIP - UN*X

• Debian family

• # apt-get install python-pip

24

Page 50: Learning Python from Data

GET PIP - UN*X

• Debian family

• # apt-get install python-pip

• Rehat family

• # yum install python-pip

24

Page 51: Learning Python from Data

GET PIP - UN*X

• Debian family

• # apt-get install python-pip

• Rehat family

• # yum install python-pip

• Mac OS X

• # easy_install pip24

Page 52: Learning Python from Data

GET PIP - WIN *

25

Page 53: Learning Python from Data

GET PIP - WIN *

• Follow the steps in http://stackoverflow.com/questions/4750806/how-to-install-pip-on-windows.

25

Page 54: Learning Python from Data

GET PIP - WIN *

• Follow the steps in http://stackoverflow.com/questions/4750806/how-to-install-pip-on-windows.

• Or just use easy_install to install. The easy_install should be found at C:\Python27\Scripts\.

25

Page 55: Learning Python from Data

GET PIP - WIN *

• Follow the steps in http://stackoverflow.com/questions/4750806/how-to-install-pip-on-windows.

• Or just use easy_install to install. The easy_install should be found at C:\Python27\Scripts\.

• Or find the Windows installer on Python Package Index.

25

Page 56: Learning Python from Data

3-RD PARTY PACKAGES

26

Page 58: Learning Python from Data

3-RD PARTY PACKAGES

• requests - Python HTTP for Humans

• lxml - Pythonic XML processing library

26

Page 59: Learning Python from Data

3-RD PARTY PACKAGES

• requests - Python HTTP for Humans

• lxml - Pythonic XML processing library

• uniout - Print the object representation in readable chars.

26

Page 60: Learning Python from Data

3-RD PARTY PACKAGES

• requests - Python HTTP for Humans

• lxml - Pythonic XML processing library

• uniout - Print the object representation in readable chars.

• clime - Convert module into a CLI program w/o any config.

26

Page 61: Learning Python from Data

YOUR TURN!

27

Page 62: Learning Python from Data

CSVLet's start from making a HTTP request!

28

Page 63: Learning Python from Data

HTTP GET

import requests!#url = 'http://stats.moe.gov.tw/files/school/101/u1_new.csv'url = 'https://raw.github.com/moskytw/learning-python-from-data-examples/master/sql/schools.csv'!print requests.get(url).content!#print requests.get(url).text

29

Page 64: Learning Python from Data

FILE

save_path = 'school_list.csv'!with open(save_path, 'w') as f: f.write(requests.get(url).content)!with open(save_path) as f: print f.read()!with open(save_path) as f: for line in f: print line,

30

Page 65: Learning Python from Data

DEF

from os.path import basename!def save(url, path=None):! if not path: path = basename(url)! with open(path, 'w') as f: f.write(requests.get(url).content)

31

Page 66: Learning Python from Data

CSV

import csvfrom os.path import exists!if not exists(save_path): save(url, save_path)!with open(save_path) as f: for row in csv.reader(f): print row

32

Page 67: Learning Python from Data

+ UNIOUT

import csvfrom os.path import existsimport uniout # You want this!!if not exists(save_path): save(url, save_path)!with open(save_path) as f: for row in csv.reader(f): print row

33

Page 68: Learning Python from Data

NEXT

with open(save_path) as f: next(f) # skip the unwanted lines next(f) for row in csv.reader(f): print row

34

Page 69: Learning Python from Data

DICT READER

with open(save_path) as f: next(f) next(f) for row in csv.DictReader(f): print row!# We now have a great output. :)

35

Page 70: Learning Python from Data

DEF AGAIN

def parse_to_school_list(path): school_list = [] with open(path) as f: next(f) next(f) for school in csv.DictReader(f): school_list.append(school)! return school_list[:-2]

36

Page 71: Learning Python from Data

+ COMPREHENSION

def parse_to_school_list(path='schools.csv'): with open(path) as f: next(f) next(f) school_list = [school for school in csv.DictReader(f)][:-2]! return school_list

37

Page 72: Learning Python from Data

+ PRETTY PRINT

from pprint import pprint!pprint(parse_to_school_list(save_path))!# AWESOME!

38

Page 73: Learning Python from Data

PYTHONIC

school_list = parse_to_school_list(save_path)!# hmmm ...!for school in shcool_list: print shcool['School Name']!# It is more Pythonic! :)!print [school['School Name'] for school in school_list]

39

Page 74: Learning Python from Data

GROUP BY

from itertools import groupby!# You MUST sort it.keyfunc = lambda school: school['County']school_list.sort(key=keyfunc)!for county, schools in groupby(school_list, keyfunc): for school in schools: print '%s %r' % (county, school) print '---'

40

Page 75: Learning Python from Data

DOCSTRING

'''It contains some useful function for paring data from government.'''!def save(url, path=None): '''It saves data from `url` to `path`.''' ...!--- Shell ---!$ pydoc csv_docstring

41

Page 76: Learning Python from Data

CLIME

if __name__ == '__main__': import clime.now!--- shell ---!$ python csv_clime.pyusage: basename <p> or: parse-to-school-list <path> or: save [--path] <url>!It contains some userful function for parsing data from government.

42

Page 77: Learning Python from Data

DOC TIPS

help(requests)!print dir(requests)!print '\n'.join(dir(requests))

43

Page 78: Learning Python from Data

YOUR TURN!

44

Page 79: Learning Python from Data

HTMLHave fun with the final crawler. ;)

45

Page 80: Learning Python from Data

LXML

import requestsfrom lxml import etree!content = requests.get('http://clbc.tw').contentroot = etree.HTML(content)!print root

46

Page 81: Learning Python from Data

CACHE

from os.path import exists!cache_path = 'cache.html'!if exists(cache_path): with open(cache_path) as f: content = f.read()else: content = requests.get('http://clbc.tw').content with open(cache_path, 'w') as f: f.write(content)

47

Page 82: Learning Python from Data

SEARCHING

head = root.find('head')print head!head_children = head.getchildren()print head_children!metas = head.findall('meta')print metas!title_text = head.findtext('title')print title_text

48

Page 83: Learning Python from Data

XPATH

titles = root.xpath('/html/head/title')print titles[0].text!title_texts = root.xpath('/html/head/title/text()')print title_texts[0]!as_ = root.xpath('//a')print as_print [a.get('href') for a in as_]

49

Page 84: Learning Python from Data

MD5

from hashlib import md5!message = 'There should be one-- and preferably only one --obvious way to do it.'!print md5(message).hexdigest()!# Actually, it is noting about HTML.

50

Page 85: Learning Python from Data

DEF GET

from os import makedirsfrom os.path import exists, join!def get(url, cache_dir_path='cache/'):! if not exists(cache_dir_path): makedirs(cache_dir)! cache_path = join(cache_dir_path, md5(url).hexdigest())! ...

51

Page 86: Learning Python from Data

DEF FIND_URLS

def find_urls(content): root = etree.HTML(content) return [ a.attrib['href'] for a in root.xpath('//a') if 'href' in a.attrib ]

52

Page 87: Learning Python from Data

BFS 1/2

NEW = 0QUEUED = 1VISITED = 2!def search_urls(url):! url_queue = [url] url_state_map = {url: QUEUED}! while url_queue:! url = url_queue.pop(0) print url

53

Page 88: Learning Python from Data

BFS 2/2

# continue the previous page try: found_urls = find_urls(get(url)) except Exception, e: url_state_map[url] = e print 'Exception: %s' % e except KeyboardInterrupt, e: return url_state_map else: for found_url in found_urls: if not url_state_map.get(found_url, NEW): url_queue.append(found_url) url_state_map[found_url] = QUEUED url_state_map[url] = VISITED

54

Page 89: Learning Python from Data

DEQUE

from collections import deque...!def search_urls(url): url_queue = deque([url])... while url_queue:! url = url_queue.popleft() print url...

55

Page 90: Learning Python from Data

YIELD

...!def search_urls(url):... while url_queue:! url = url_queue.pop(0) yield url... except KeyboardInterrupt, e: print url_state_map return...

56

Page 91: Learning Python from Data

YOUR TURN!

57

Page 92: Learning Python from Data

SQLHow about saving the CSV file into a db?

58

Page 93: Learning Python from Data

TABLE

CREATE TABLE schools ( id TEXT PRIMARY KEY, name TEXT, county TEXT, address TEXT, phone TEXT, url TEXT, type TEXT);!DROP TABLE schools;

59

Page 94: Learning Python from Data

CRUD

INSERT INTO schools (id, name) VALUES ('1', 'The First');INSERT INTO schools VALUES (...);!SELECT * FROM schools WHERE id='1';SELECT name FROM schools WHERE id='1';!UPDATE schools SET id='10' WHERE id='1';!DELETE FROM schools WHERE id='10';

60

Page 95: Learning Python from Data

COMMON PATTERN

import sqlite3!db_path = 'schools.db'conn = sqlite3.connect(db_path)cur = conn.cursor()!cur.execute('''CREATE TABLE schools ( ...)''')conn.commit()!cur.close()conn.close()

61

Page 96: Learning Python from Data

ROLLBACK

...!try: cur.execute('...')except: conn.rollback() raiseelse: conn.commit()!...

62

Page 97: Learning Python from Data

PARAMETERIZE QUERY

...!rows = ...!for row in rows: cur.execute('INSERT INTO schools VALUES (?, ?, ?, ?, ?, ?, ?)', row)!conn.commit()!...

63

Page 98: Learning Python from Data

EXECUTEMANY

...!rows = ...!cur.executemany('INSERT INTO schools VALUES (?, ?, ?, ?, ?, ?, ?)', rows)!conn.commit()!...

64

Page 99: Learning Python from Data

FETCH

...cur.execute('select * from schools')!print cur.fetchone()!# orprint cur.fetchall()!# orfor row in cur: print row...

65

Page 100: Learning Python from Data

TEXT FACTORY

# SQLite only: Let you pass the 8-bit string as parameter.!...!conn = sqlite3.connect(db_path)conn.text_factory = str!...

66

Page 101: Learning Python from Data

ROW FACTORY

# SQLite only: Let you convert tuple into dict. It is `DictCursor` in some other connectors.!def dict_factory(cursor, row): d = {} for idx, col in enumerate(cursor.description): d[col[0]] = row[idx] return d!...con.row_factory = dict_factory...

67

Page 102: Learning Python from Data

MORE

68

Page 105: Learning Python from Data

MORE

• Python DB API 2.0

• MySQLdb - MySQL connector for Python

• Psycopg2 - PostgreSQL adapter for Python

68

Page 106: Learning Python from Data

MORE

• Python DB API 2.0

• MySQLdb - MySQL connector for Python

• Psycopg2 - PostgreSQL adapter for Python

• SQLAlchemy - the Python SQL toolkit and ORM

68

Page 107: Learning Python from Data

MORE

• Python DB API 2.0

• MySQLdb - MySQL connector for Python

• Psycopg2 - PostgreSQL adapter for Python

• SQLAlchemy - the Python SQL toolkit and ORM

• MoSQL - Build SQL from common Python data structure.

68

Page 108: Learning Python from Data

THE END

69

Page 109: Learning Python from Data

THE END

• You learned how to ...

69

Page 110: Learning Python from Data

THE END

• You learned how to ...• make a HTTP request

69

Page 111: Learning Python from Data

THE END

• You learned how to ...• make a HTTP request• load a CSV file

69

Page 112: Learning Python from Data

THE END

• You learned how to ...• make a HTTP request• load a CSV file• parse a HTML file

69

Page 113: Learning Python from Data

THE END

• You learned how to ...• make a HTTP request• load a CSV file• parse a HTML file• write a Web crawler

69

Page 114: Learning Python from Data

THE END

• You learned how to ...• make a HTTP request• load a CSV file• parse a HTML file• write a Web crawler• use SQL with SQLite

69

Page 115: Learning Python from Data

THE END

• You learned how to ...• make a HTTP request• load a CSV file• parse a HTML file• write a Web crawler• use SQL with SQLite• and lot of techniques today. ;)

69